Science

University of Cambridge debuts virtual talking head capable of expressing human emotions

University of Cambridge debuts virtual talking head capable of expressing human emotions
Meet Zoe - a virtual talking head capable of expressing human emotions
Meet Zoe - a virtual talking head capable of expressing human emotions
View 2 Images
Meet Zoe - a virtual talking head capable of expressing human emotions
1/2
Meet Zoe - a virtual talking head capable of expressing human emotions
The team recorded thousands of sentences from actress Zoe Lister
2/2
The team recorded thousands of sentences from actress Zoe Lister

The University of Cambridge in the United Kingdom has unveiled a virtual “talking head” that is capable of expressing a range of human emotions. The lifelike face called Zoe, which the team believes is the most expressive controllable avatar ever created, could in the future be used as a digital personal assistant.

The voice controlled virtual assistant has been a staple of science fiction for decades, from Star Trek and Red Dwarf to Iron Man, it's up there with faster-than-light travel as one of the most mythologized technologies going. Though recent technologies like Apple's Siri and Samsung's S Voice have brought us a little closer to that dream by allowing us to have (fairly) life-like conversations with our smartphones, there's still a palpable sense that we're just talking to lifeless machines.

The University of Cambridge has been working to tackle this exact problem, and while the team's “Zoe” digital talking head may not be completely convincing, it's certainly a step in the right direction.

The team has developed a virtual, controllable avatar that is capable of expressing a range of emotions with what the team believes to be “unprecedented realism.” It works by a user entering a line of text and selecting from a range of sliders that determine emotion. Hit the enter key and Zoe will read the message in however much of a happy, angry or sad tone you desire.

To create the system, the team spent days recording the face and voice of actress Zoe Lister while she recited more than 7000 lines of text in varying emotions. This data was then used to create six basic emotions for Zoe – happiness, sadness, fear, anger, tenderness and neutrality, as well as changeable pitch, speed and depth settings. Combinations of these levels allow for a huge range of emotions, something that has not been possible in other avatars of this type.

The team recorded thousands of sentences from actress Zoe Lister
The team recorded thousands of sentences from actress Zoe Lister

The resulting system was tested on volunteers who were able to recognize the emotion being conveyed with an impressive 77 percent success rate. Bizarrely, this was actually higher than the 73 percent recognition rate when the volunteers were shown the real-life Zoe Lister.

The program itself is impressively data light, coming in at just ten megabytes in size. This means that we could well see the technology in future mobile devices such as smartphones and tablets. The template surrounding the technology also has the potential to allow for people to upload their own faces and voices in just a few seconds, rather than the days it took the team to create Zoe.

“It took us days to create Zoe, because we had to start from scratch and teach the system to understand language and expression," Professor Roberto Cipolla from the University's Department of Engineering said. "Now that it already understands those things, it shouldn't be too hard to transfer the same blueprint to a different voice and face.”

The team is working to improve the realism of the technology while exploring real world applications, such as sending friends a digital “face message” that conveys the emotion you're feeling, or helping autistic and deaf children to understand emotions and learn to lip-read.

The impressive (and occasionally terrifying) Zoe can be seen in action below.

Source: University of Cambridge

Face of the future rears its head

10 comments
10 comments
Marcus Carr
Interesting, but getting information out of a computer is the easy side of the equation. The real gains are in getting it to understand our face and words, not the other way around.
Ong Chia Hooi
remind me of Zordon..
Rafael Kireyev
Wow! Every day more and more of a sci-fi becomes reality.
ringo the Baptist
Very good!
For me the head movement is the next big bump to smooth over.
I find it too heavily mechanically derived from every individual syllable.
Individual syllables should contribute only a secondary element to the modulation of the head movement – the primary head movement modulation needs to come more from the points of emphasis in PHRASES, perhaps peaking on the adjectives, or the verbs when there are no adjectives
The average position that the head returns to must never be locked on dead centre. Mood should also modulate the average position - and the intensity of the modulation.
I appreciate the achievement and realize that these are early days yet.
Anybody else agree with me?
Easy to criticise when there is something there - hats off to you guys - keep going!
Patrick Young
LOL! It's "Hilly" the talking computer from Red Dwarf Series II, "Parallel Universe"
Sergio Pissanetzky
Neuroscientist Paul Frenger worked for many years in the simulation of emotions. He's developed an analog system and published many papers about it. Maybe his system can be combined with this one to simulate the emotions rather than just setting them.
Bob Tackett
It does not "express" emotion... it might fake, imitate, or emulate emotion, but since it does not actually have emotion, it can't express it... Not to get all OCD, but let's not give the computer too much credit until it actually gets to that point... and it might.
Mirmillion
Interesting but still bit too shape-shifting around the upper head for my tastes. The blinking eyes should also be just a touch slower and less alien-like. Otherwise very good. A bit scary but good.
machinephilosophy
Need some math and some empirical documentation of real humans speaking, to get rid of both the bobbing up and down of the back of the head and the general jerkiness. A few general approximation-transition algorithms that are readily available should resolve these problems. Hopefully these people aren't doing all this from scratch, since there's a lot of public domain knowledge of how to resolve both graphical and pronunciation jerkiness issues. In text to speech, check out the latest (and amazing) Ivona voices.
Tom Swift
Finally we can get rid of those overpaid talking heads on the TV news.