Google released a new API that it says will allow developers to build applications that can see and understand the content of images.
Google’s Cloud Vision API is available in limited preview mode and leverages the company’s expertise in machine learning, especially with technologies such as the recently open-sourced TensorFlow.
In a blog post this week, Google Product Manager Ram Ramanathan described Cloud Vision API as an easy-to-use representational state transfer (REST) programming interface that integrates elements of Google’s machine learning technology.
Developers can use the API to write apps that will be capable of classifying images into different categories, detecting faces and emotions, and recognizing printed words inside images. Cloud Vision API, for instance, will enable image classification into broad categories like “lion,” “boat” or “Eiffel Tower,” Ramanathan said.
“The uses of Cloud Vision API are game-changing to developers of all types of applications, and we are very excited to see what happens next,” Ramanathan wrote.
Google Cloud Vision API will let developers enable different kinds of insight from image content.
For example, a label/entity detection feature picks up the most prominent entity in an image, such as a car or an animal in a photo. Using the feature, a developer could write an application that supports image-based searches or recommendations.
Similarly, an optical character-recognition feature enables text retrieval from within an image. The API provides a language-recognition feature that supports a long list of languages. Developers can use the API to enable other kinds of image-related features in their applications, as well.
For instance, a safe-search detection capability enables detection of images with inappropriate content, while a facial-detection feature can spot faces in an image. It uses measurements related to features like the eye, mouth and nose to discern emotions like joy or sorrow in the faces within an image.
Also available within the API are capabilities for landmark detection and for product logo detection within an image. “With Cloud Vision API, you can build metadata on your image catalog, moderate offensive content or enable new marketing scenarios through image sentiment analysis,” Ramanathan said.
As an example of what developers can do with the API, Google posted a video of a demo robot capable of roaming about and identifying objects, including faces and associated emotions. Ramanathan also pointed to work being done by Japanese drone maker Aerosense, which has apparently been testing the API for sometime. According to Aerosense, its drones take thousands of photos with each flight and the API gives the company the best shot at extracting meaningful insight from the images.
As cutting-edge as some of the features might seem, they are not entirely new. Google’s own Photo app, for instance, is capable of separating photos into separate categories like “people,” “places” and “things.” It allows users to search for photos using specific terms like “cats” or “dogs” or even adjectives like “beautiful,” even though the photos need to be tagged as such.
Google’s goal in releasing the API is to get developers to integrate similar capabilities into a broader set of applications.