Microsoft is applying its machine learning research to images and the unspoken information they may contain.
Microsoft Research Distinguished Scientist and Deputy Director John Platt and his team have been working on software that automatically captions images. The project began this summer, when a multi-disciplinary team of experts decided to tackle the challenge of distilling photos into a sentence that makes sense instead of a jumble of words.
The system devised by Platt and his group has beaten human-generated image captioning in tests although not always, he explained in a blog post. “I’m happy to report that, in terms of BLEU score, we actually beat humans,” he wrote. “Our system achieved 21.05 percent BLEU score while the human ‘system’ scored 19.32 percent.”
BLEU, or Bilingual Evaluation Understudy, is an algorithm used to determine the quality of a machine-translated text. “BLEU breaks the captions into chunks of length (one to four words), and then measures the amount of overlap between the system and human translations. It also penalizes short system captions,” he explained.
Despite the achievement, the technology is far from perfect.
“BLEU has many limitations that are well-known in the machine translation community,” he said. “We also tried testing with the METEOR [Metric for Evaluation of Translation with Explicit Ordering] metric, and got somewhat below human performance (20.71 percent vs. 24.07 percent).”
The technology was preferred by a fair amount of people when they were asked to evaluate Microsoft’s machine-generated captions. Using Amazon’s Mechanical Turk service, which pays “workers” to complete Human Intelligence Tasks online, “people thought that the system caption was the same or better than a human caption,” reported Platt.
Microsoft has been aggressively leveraging its machine learning research to build a smart software and services portfolio.
In March, Platt told attendees of the GigaOm Structure Data conference that “machine learning is pretty much pervasive throughout all Microsoft products. So, whenever you use a Microsoft product, you’re using a system that’s been generated from machine learning.”
Examples include the company’s Bing search engine and Kinect motion sensor. “The only way you can answer the billions of questions Bing answers is to have something that operates autonomously In Xbox; the Kinect was also trained with machine learning,” said Platt. “The fact that it can see you in the room even though it’s poor lighting and you can wave your arms and it can track you—that’s all done with a piece of software that was trained with machine learning.”
Microsoft has since introduced a new cloud-based machine learning service (Azure ML) for businesses venturing into predictive analytics. Office Graph, the underlying technology that powers the Office Delve app, leverages machine learning to determine the connections between workers and bring to the surface conversations and content in an effort to foster collaboration and improve productivity.