The image captioning AI knows more than just what’s in a picture. It’s learning to understand what those people and objects are doing.
Google has released the latest iteration of its machine learning system that figures out what’s in an image and captions it, and it’s better than ever. The company also made it open-source. Google has been working on the program since 2014, and now says the algorithm can describe a picture with 93.9 percent accuracy.
The big question for the Google team, as they were working on this newest iteration that uses an Inception architecture, was whether the algorithm could do more than simply identify objects within images set before it. To really interpret and caption a photo, AI needs to understand not only what’s in the picture but also how certain objects in the image interact with one another. This couldn’t just be a “regurgitation” of data, Google’s developers say. The algorithm had to be able to naturally develop an understanding of the objects in the image and their uses.
“Excitingly,” the blog posts says, “our model does indeed develop the ability to generate accurate new captions when presented with completely new scenes, indicating a deeper understanding of the objects and context in the images.” Just as important, “it learns how to express that knowledge in natural-sounding English phrases despite receiving no additional language training other than reading the human captions.”
Machine learning algorithms are proving to have a greater understanding, at least at the moment, of still images as opposed to video.
Source: Google Research Blog
This article was originally written for and published by Popular Mechanics USA.