Asia Online – the world’s most significant literacy project (and internet investment opportunity)
By Mike Hanlon
01:17 September 22, 2008 PDT

The opportunity in one image
Image Gallery (22 images)“The more domains we cover, the better it gets. We have some domains now which are being translated extraordinarily well. French to English for legal documents is currently running at a BLEU score of 62 – a human scores around 65 on the BLEU scale, so we’re getting pretty good in that area.
“We now need to work up to nailing as many specific domains as we possibly can.”
“What you would have dealt with previously is rules based machine translation. Rules based translation involves putting a bunch of very smart linguists and very smart programmers and locking them in a room for five years.
“The problems with rules based translation is that it’s flat when the outcome is produced. If I feed it Harvard Business Review, I want the translation to read like Harvard Business Review. If I feed it an Enid Blyton childrens book, I want it to read accordingly, and I definitely don’t want it to read like Harvard Business Review or vice verca.
“With rules translation, you can’t handle that – with the statistical translation we use, we can handle that, and can stylise the output to read like a particular genre of literature. It requires data and we’re in the early stages of gathering a lot of this data right now. Every week we get much much better.”
“We already have all 23 European languages operational to a certain degree. So we can translate directly from Danish to Dutch or Greek. We’re doing the same for all the Asian languages. Currently, if you want to translate a Thai document to Vietnamese, you have to translate it to English first and then get the English translated to Vietnamese. Very shortly we’ll be able to do it directly.
“Many of the documents we are currently using to make our translation engines stronger are legacy documents from translation service providers, so we might have a book in English and the same book in Japanese. We then get that same document in Thai and feed that into the system. We then marry those translations, using the English version as the key and we then have Japanese to Thai language pairing and can work towards building its accuracy.
“Where this gets interesting is with minority languages such as Khmer that has very little data available”, Wiggins says.
Or Login with Facebook:
Related Articles
Just enter your friends and your email address into the form below ...
Privacy is safe with us because we have a strict privacy policy.

























Terotech
- November 21, 2009 @ 19:38 UTC