Asia Online – the world’s most significant literacy project (and internet investment opportunity)
By Mike Hanlon
01:17 September 22, 2008 PDT

The opportunity in one image
Image Gallery (22 images)Asia Online’s Translation Technology is based upon more than a decade of research into statistical machine translation. It learns by being given samples of bi-lingual text. It then analyzes this text and creates statistical patterns which it uses to translate new documents. The more text samples it gets fed, the better the translation output. These slides (here, here and here) from Asia Online's investor presentation tell the story graphically.
Using an on-line sourced pool of proofreaders, the technology employs a “human-machine feedback loop” which enables the system to learn and become increasingly refined as more proofreader corrections are fed through the system.
To achieve translation quality similar to human translators requires approximately 10 million sentence pairs (3000 books). The more input data given, the more accurate the translations. As the system stands, the company is confident it has the world's most sophisticated and powerful translation technology and is already available through a “software as a service” model to language service providers and global organizations with mass translation needs.
“We have overcome the technical challenges and we are now populating the language pairings with data to build the quality” explains Wiggins.
“The way our system learns is like a child. When you were three years old, your parents didn’t say, ‘this is a noun, this is a verb, this is a subject’. You listened and patched pieces of phrases together and you learned patterns and so on. Our software learns the same way.
“The system takes a book or another document type in English and it takes the same material in Thai, and it matches sentences and it breaks those sentences apart and it takes patterns from them. The maximum pattern length is eight words, though it may be as small as one word, and it takes all those patterns and it learns, quite literally, in the same way that a child learns – ‘When I see this, I see that.’ The more information it processes, the smarter it gets.
“There’s a scoring system called BLEU – it compares against a human translator and rates against the accuracy of the translation. The typical machine translation stuff you see on the net today scores a 10 to a 20. Our baseline systems when we start out with a new language pairing begin at a 20. We have 203 language pairings scoring at least that as a baseline. It’s usable now, but in certain areas it is very good.”
“Then you add domain data. If you say to the machine ‘this is IT specific’, and throw a lot of IT translations at it, with that area’s unique acronyms and languaging, it can then translate IT very well. It will translate sport or football terribly, but if it knows a text is about information technology, it will translate it very well indeed. If you put that same topic-specific system onto an article about general news, it dies tragically. Put it onto a text about anything outside that domain and it doesn’t work as well. Our goal is to get every possible domain we can covered.”
Or Login with Facebook:
Related Articles
Just enter your friends and your email address into the form below ...
Privacy is safe with us because we have a strict privacy policy.

























Terotech
- November 21, 2009 @ 19:38 UTC