An artificial intelligence breakthrough from the universities of New York, Toronto and MIT is showcasing the impressive ability of artificial intelligence to learn visual concepts in a single shot and manipulate them in human-like ways. The advance could lead to smarter phones, much-improved speech recognition, and computers that better understand the world around them.
Human beings show a remarkable ability to learn things on the fly: children, for example, need only be shown one example of a new object like a dog or schoolbus before they can identify other instances on their own. One of the reasons for our quickness, researchers believe, is that we often understand new concepts in terms of how their familiar parts work together as a whole. When we first saw a Segway, we quickly recognized wheels and a handle, concluding to a reasonable degree of certainty that it must be some form of personal transportation.
The same functional view of reality is true when it comes to language. When we see characters written on a piece of paper, even unfamiliar ones, we don't just see the ink on the page but also the series of pen strokes that drew them, so we can easily reproduce the character ourselves. And when we first hear an unfamiliar term – say, the name Chewbacca – we can repeat it even if we don't understand its meaning, because we parse sounds in terms of the muscle movements that would produce them.
Unfortunately, translating this remarkable one-shot learning ability to the domain of artificial intelligence is proving a tremendous ask. State-of-the-art "deep learning" algorithms are mainly concerned with pattern recognition, which they can only perform after being carefully trained on hundreds or thousands of examples. Even then, this software can only understand the object in a passive way, as a pattern of pixels on a screen, rather than using the concept to create something new.
The entire field of artificial intelligence is only a few decades old, but the issue at the root of human learning is something that has puzzled philosophers for millennia. It's the problem of induction, or how the human mind is able to effectively generalize abstract, inclusive concepts from a limited number of samples.
Researchers Joshua Tenenbaum, Brendan Lake and Rusian Salakhutdinov have now taken an important step toward replicating this kind of single-shot learning inside a computer. Their probabilistic system, which they call Bayesian program learning (BPL), promises to be an important step in fields like voice recognition and synthesis, image recognition and natural language processing. But more generally, their advance could lead to computers that better understand the world around them and build on what they learn to execute progressively more complex tasks.
The software is built around the three principles of compositionality (the idea that abstract representations are built from more primitive parts), causality (using the primitive parts to build a complex structure) and learning to learn (the principle that knowledge of previous concepts can make learning new concepts easier). On the practical level, the probabilistic technique of Bayesian inference is at the heart of the algorithm and is used to draw conclusions based on limited data on which simple parts make up a more complex visual object.
"Our work is based on capturing the mental models of humans with simple kinds of computer programs which we think our minds construct and manipulate," says Lake. "For the first time we think we have a machine system that can learn a large class of visual concepts in ways that are hard to distinguish from human learners."
The team's software was tested on a list of 1,600 unfamiliar characters taken from languages around the world, both real and imaginary. After being fed a single hand-drawn version of a character as its starting point, the algorithm was able to successfully recognize it among all other characters, break it down into the series of pen strokes that drew it, and even redraw it with small variations while keeping the character still recognizable to human eyes.
This unique approach of breaking down a complex image and attempting to understand how its parts work together allows the software to perform creative tasks that are out of the question for algorithms based on pattern recognition alone. When presented with an unfamiliar alphabet, for instance, the researchers' software extracted general properties from the strokes that made up each character and was able to produce a new character featuring those properties.
The software did so well with this demanding creative task that its performance was deemed virtually indistinguishable from a human's, as confirmed by a Turing test. In the test, 147 judges were presented 49 trials each where a series of alphabet symbols was followed by two extra characters inspired by those alphabets – one invented by a human, one by the software. Collectively, the judges were only able to identify the computer-generated character 52 percent of the time, which is not significantly better than a random fifty-fifty guess.
"The algorithm only works for handwritten characters currently, but we believe the broader approach based on probabilistic program induction can lead to progress in speech recognition and object recognition," says Lake.
One of the ways it could improve speech recognition could be through your smartphone assistant of choice. Just like manipulating an unfamiliar character, the software could be made to "read the user's mind" and transcribe an unfamiliar word based on what it assumed were the user's mouth movements that produced the sound. It could then go further and parrot back the new word to the user, asking for a definition. Following that, the software would be able to add the word to its vocabulary and use it correctly and in context in the future.
Other possible tasks could include recognizing the style of a painting from the ensemble of its parts, guessing the function of an unfamiliar object from its components, and gaining a much better understanding of natural human language (something much more demanding than mere speech recognition and which could let us converse with our computers and smartphones on any topic, rather than be confined to things like weather, traffic and sport results).
So, while the sheer speed and complexity of an artificial brain is bound to be a factor in reaching high complexity of thought, this new research suggests that an appropriate learning algorithm can be equally decisive in obtaining human-level intelligence that can extract and manipulate useful information from very limited amounts of data.
Its performance however, warn the researchers, will largely depend on carefully choosing the elementary parts (pen strokes, phonemes, and so on) from which the more complex ideas are built within a given domain.
The advance is further detailed in a paper published in the journal Science.
Source: New York University
See the stories that matter in your inbox every morning