Microsoft has delivered a beta release of the new version of its Cognitive Toolkit (CNTK), CNTK 2.0, which offers improved performance and flexibility.
Previously known as the Computational Network Toolkit (CNTK), the Microsoft Cognitive Toolkit is a system for applying deep learning into applications. Deep learning is a form of machine learning that is used to enhance innovations in speech and image recognition as well as search relevance on CPUs and GPUs.
Microsoft said the new release of the toolkit is being released at a time when companies of all sizes are looking to add deep learning to things such as speech understanding and image recognition.
“Broadly speaking, deep learning is an artificial intelligence (AI) technique in which developers and researchers use large amounts of data—called training sets—to teach computer systems to recognize patterns from inputs such as images or sounds,” said a Microsoft blog post on CNTK 2.0.
“For example, a deep learning system can be given a training set showing all sorts of pictures of fruits and vegetables, after which it learns to recognize images of fruits and vegetables on its own,” the post said. “It gets better as it gets more data, so each time it encounters a new, weird-looking eggplant or odd-shaped apple, it can refine the algorithm to become even more accurate.”
Xuedong Huang, distinguished engineer and chief scientist of speech R&D at Microsoft, told eWEEK that deep learning is changing almost every day. He said Microsoft started its efforts to digitally recognize conversation speech in 1993 and gradually improved its error rate over the years.
“When Microsoft introduced the speech API in 1995 for Windows 95, the error rate to recognize a switchboard conversation would be around 60 percent,” Huang said. “Today, it’s 5.8 percent—the lowest in the industry,” he said of Microsoft’s speech recognition capability.
Huang said Microsoft set a goal a year ago to deliver the record lowest speech recognition error rate in the industry. “Before us, IBM Watson had the lowest error rate,” he said. “So IBM Watson had been leading the pack. A month ago we delivered a lower result. And a week ago we reported we not only had the lowest result, we also approached parity with humans. So essentially, computers without having a real-time constraint can recognize the words or transcribe the words of an open conversation as good as humans. That’s a truly historical milestone.”
He noted that Microsoft is having similar success with image recognition and with its conversational chatbot.
“Because of deep learning, our ability to engage with machines in a more natural way is having absolutely fantastic breakthroughs,” he said. “Microsoft combined a number of APIs to enable developers to really access AI. That’s part of the effort to democratize AI. The Cognitive Toolkit is a toolkit that has created all these intelligence models.”
Chris Basoglu, a partner engineering manager at Microsoft who played a key role in developing the toolkit, said one key reason to use the Microsoft Cognitive Toolkit is its ability to scale efficiently across multiple GPUs and multiple machines on massive data sets.
When Microsoft researchers initially developed the toolkit, he said, they figured many developers couldn’t, or wouldn’t, want to write a lot of code, the Microsoft CNTK 2.0 post said. “So, they created a custom system that made it easy for developers to configure their systems for deep learning without any extra coding.”
However, as the system grew more popular, Microsoft heard from developers who wanted to combine their own Python or C++ code with the toolkit’s deep learning capabilities, the post said.
“They also heard from researchers who wanted to use the toolkit to enable reinforcement learning research,” the Microsoft post said. “That’s a research area in which an agent learns the right way to do something—like find their way around a room or form a sentence—through lots of trial and error. That’s the kind of research that could eventually lead to true artificial intelligence, in which systems can make complex decisions on their own. The new version gives developers that ability as well.”
CNTK 2.0 is open-sourced through GitHub. It has some unique advantages and “is super-fast and it scales,” Huang said. Microsoft benchmarked four open-source machine learning frameworks: TensorFlow by Google, Caffe, CNTK and Torch and CNTK performed at or near the top in a series of tests, he said.
“CNTK can handle big, commercial-grade work,” Huang said. “We have been using CNTK internally for massive AI workloads. CNTK has been used broadly internally on speech recognition, image recognition, Cortana, web search and the conversational chatbot. It can handle massive AI workloads.”
The Microsoft blog post said Microsoft Cognitive Toolkit can easily handle anything from relatively small datasets to very, very large ones, using just one laptop or a series of computers in a data center. It also can run on computers that use traditional CPUs or GPUs, which were once mainly associated with graphics-heavy gaming but have proven to be very effective for running the algorithms needed for deep learning, the post said.
“Microsoft Cognitive Toolkit represents tight collaboration between Microsoft and NVIDIA to bring advances to the deep learning community,” said Ian Buck, general manager of the Accelerated Computing Group at NVIDIA, in a statement in Microsoft’s blog. “Compared to the previous version, it delivers almost two times performance boost in scaling to eight Pascal GPUs in an NVIDIA DGX-1.”