Purchasing new hardware? Read our latest product comparisons

New software translates users' speech, using their own voice


March 13, 2012

New software developed by Microsoft is able to reproduce the user's speech in another language, using their own voice (Image via Shutterstock)

New software developed by Microsoft is able to reproduce the user's speech in another language, using their own voice (Image via Shutterstock)

For some time now, speech-recognition programs have existed that attempt to reproduce the user's spoken words in another language. Such "speech-to-speech" apps, however, provide their translations using a very flat, synthetic voice. Now, experimental new software developed by Microsoft is able not only to translate between 26 different languages, but it plays the translated speech back in the user's own voice - complete with the inflections they used when speaking in their own language. It looks like a real-life version of Star Trek's universal translator could soon be here.

The system was demonstrated this Tuesday at Microsoft's Redmond, Washington, campus, by its inventor, Microsoft research scientist Frank Soong. He started by using the software to read out Spanish text using the voice of his boss, Rick Rashid, and then proceeded to use it to allow the company's chief research and strategy officer, Craig Mundie, to converse in Mandarin.

So far, the program isn't ready to go as soon as it's been installed. Users must initially spend about an hour with it, training it to recognize and reproduce their voice. Once that's been accomplished, the software applies that user-specific speech model to a generic text-to-speech model for the desired output language. Individual sounds of the user's voice are selected from the training session, then strung together and appropriately altered, in order to create a natural-sounding translation.

It's been suggested that such a system would make users more confident that their speech was being translated accurately, and that fewer misunderstandings would occur due to a lack of context - in other words, it would be more obvious if the speaker was being sarcastic, or exaggerating. It could also help facilitate the learning of foreign languages, as students may find it easier to imitate phrases spoken in their own voice.

Examples of a phrase spoken in different languages via the system can be heard in the link below.

Via: Technology Review

About the Author
Ben Coxworth An experienced freelance writer, videographer and television producer, Ben's interest in all forms of innovation is particularly fanatical when it comes to human-powered transportation, film-making gear, environmentally-friendly technologies and anything that's designed to go underwater. He lives in Edmonton, Alberta, where he spends a lot of time going over the handlebars of his mountain bike, hanging out in off-leash parks, and wishing the Pacific Ocean wasn't so far away. All articles by Ben Coxworth

Damnit, it still is primitive.. :/

Renārs Grebežs

FINALLY! I have been saying this for many years-WOW someone finally is getting it!



I don't really sound like that do I!? I hope my voice comes out better than it usually sounds on tape.


color me utterly disappointed by the dismal demo

Chi Sup

Good timing! Now that many countries are offering several levels of multimedia education on line, they can do so in many languages at once. No reason why anyone in the world with a little motivation can't learn the skills that are needed for tomorrows challenges.


Haven't programs like this already been around for a decade or so such as Dragon Reader/NaturallySpeaking?

Steven Hawes
Post a Comment

Login with your Gizmag account:

Related Articles
Looking for something? Search our articles