MIT teaching machines to reason about what they see

Monday, 08 April, 2019

Deep learning systems interpret the world by picking out statistical patterns in data. This form of machine learning is now everywhere, automatically tagging friends on Facebook, narrating Alexa’s latest weather forecast and delivering fun facts via Google search. But statistical learning has its limits. It requires tons of data, has trouble explaining its decisions and is terrible at applying past knowledge to new situations. For example, a child who has never seen a pink elephant can still describe one, but a computer cannot. It can’t comprehend an elephant that’s pink instead of grey.

“The computer learns from data,” said Jiajun Wu, a PhD student at MIT. “The ability to generalise and recognise something you’ve never seen before — a pink elephant — is very hard for machines.”

To give computers the ability to reason more like us, artificial intelligence (AI) researchers are returning to abstract, or symbolic, programming. Popular in the 1950s and 1960s, symbolic AI wires in the rules and logic that allow machines to make comparisons and interpret how objects and entities relate. Symbolic AI uses less data, records the chain of steps it takes to reach a decision and, when combined with the brute processing power of statistical neural networks, it can even beat humans in a complicated image comprehension test.

A new study by a team of researchers at MIT, MIT-IBM Watson AI Lab and DeepMind shows the promise of merging statistical and symbolic AI. Led by Wu and Joshua Tenenbaum, a professor in MIT’s Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory, the team shows that its hybrid model can learn object-related concepts like colour and shape, and leverage that knowledge to interpret complex object relationships in a scene. With minimal training data and no explicit programming, their model could transfer concepts to larger scenes and answer increasingly tricky questions as well as or better than its state-of-the-art peers.

“One way children learn concepts is by connecting words with images,” said the study’s lead author Jiayuan Mao, an undergraduate at Tsinghua University who worked on the project as a visiting fellow at MIT. “A machine that can learn the same way needs much less data, and is better able to transfer its knowledge to new scenarios.”

“The trick, it turns out, is to add more symbolic structure, and to feed the neural networks a representation of the world that’s divided into objects and properties rather than feeding it raw images,” said Jacob Andreas, another researcher. “This work gives us insight into what machines need to understand before language learning is possible.”

The MIT-IBM team is now working to improve the model’s performance on real-world photos and extending it to video understanding and robotic manipulation.

Image: Researchers trained a hybrid AI model to answer questions like “Does the red object left of the green cube have the same shape as the purple matte thing?” by feeding it examples of object colours and shapes followed by more complex scenarios involving multi-object comparisons. The model could transfer this knowledge to new scenarios as well as, or better than, state-of-the-art models using a fraction of the training data. Source: Justin Johnson