Asia Online – the world’s most significant literacy project (and internet investment opportunity)
By Mike Hanlon
01:17 September 22, 2008

The opportunity in one image
Image Gallery (22 images)September 22, 2008 We’ve all dreamt of going back in history, knowing what we know now. Imagine it was the start of the internet all over again – to be able to make all the right moves because you knew how the web would be monetized, the importance of search and how to leverage it, which business models would work, and which ones wouldn’t. Asia Online appears to have manufactured itself that exact scenario in Asia with its new self-learning statistical machine translation language technologies which it is focusing on Asian languages – as the knowledge-deprived populations of Southern Asian countries adopt the internet, Asia Online looks set to play a huge role by providing information in the language of choice for the dozens of previously information-disenfranchised population groups – groups which will make up roughly half of the internet user base within four years. Viewed from another angle Asia Online’s work is about information empowerment. “Our goal here is to eliminate information poverty”, says CEO Dion Wiggins.
The technology and how it can be used
From a big picture point of view, Asia Online’s self learning Statistical Machine Translation (SMT) could have some remarkable outcomes for the internet and mobile telephone worlds as web pages, blogs, instant messaging and virtually any form of written communication, including instantaneous translation of signposts in a foreign land, will become possible in the short to medium term future.
When we proposed that scenario (automated remote sign translation), Asia Online’s CEO was quick to point out they already had that running in the development area. As for automated, all-language blogs, he suggested the end result might look like Google’s existing Chatbot on Google Talk, “except with high quality translation.”
Asia Online’s tag line is “The World Speaks One language – YOURS” and that’s clearly where things are headed for the company. Everyone will be able to read all information in their own language, eliminating one of the greatest barriers to education and the common man becoming ‘well informed’. In the foreseeable future, a blog post will be equally accessible to a Tagalog speaker in the Phillipines or a Kymer speaker in Cambodia as it is presently to the world’s English speakers.
Beyond that, says Asia Online’s CEO Dion Wiggins, “I don’t want people to see AO as a technology company – we’re a content company that has an amazing technology.”
The background to Asia Online
Asia Online was formed in late 2006 by Dion Wiggins, Greg Binger and Bob Hayward and has achieved a number of world-first technology breakthroughs since then, some of which are in the process of being patented.
Asia Online’s Chief Scientist is Philipp Koehn (PhD), a global thought leader on the subject of statistical machine translation. A former MIT researcher, Koehn is currently teaching at the University of Edinburgh, and has a book titled “Statistical Machine Translation” scheduled for publication later this year.
The killer app – providing content to 60% of the world’s population
By far the most significant of Asia Online’s technologies is its statistical machine translation platform which learns as it goes, continually providing better translations into the bargain. It can currently translate between 203 language pairings and has 500+ language pairings under development, with a view to having them available by the end of 2009
Why is this significant?
This illustration graphically represents where the world’s internet population currently resides – it is US and Euro-centric and predominantly English-based.
More than 85% of the world’s written knowledge is in non-Asian languages, with the vast majority in English. Of the 13.8% of the world’s knowledge written in Asian languages, almost all of it is in Chinese, Japanese or Korean – less than 0.03% of this knowledge is in other Asian languages.
Quite understandably, this represents a serious barrier to knowledge acquisition in Asian countries such as Thailand, Indonesia and India, which are some of the world’s most populous nations. There are just 10 million local-language web pages in Southeast Asia in total, with Indonesia having just one million local language pages, serving a country of 235 million people. It’s worth noting here that that Indonesia’s the internet userbase of 23M is larger than the entire population of Australia.
Due to its large number of developing nations, multiplicity of languages and the fact it has been largely ignored by seasoned content providers until now, Asia has a dearth of content in Asian languages. This content vacuum represents a significant opportunity. Beginning with Thai, Asia Online’s mission is to become the dominant Internet Portal in Asia.
Currently, Asia has half the internet penetration of the rest of the world – 15.3% of the population of Asia is connected, compared with 30.5% for the rest of the world. A technologically mature nation such as the United States is at almost 75% penetration, and approaching saturation point, with little room for growth.
Asia also has 60% of the world’s population, and Emarketer forecasts internet penetration to this sizeable chunk of humanity will have a compound annual growth rate of 11.4% through 2012. By 2012, Emarketer predicts Asia will have 818 million internet users – almost double the number of European users and more than three time the USA and Canada combined.
The reward for Asia Online’s efforts will undoubtedly come in the form of a significant share of the advertising revenues expected in the region when the population goes online, much as it has done in the mature European and American marketplaces over the last decade.
“Advertising in South East Asia is young and the market is growing rapidly”, says Wiggins. “There has not been a lot of advertising in this market because there has not been a lot of quality content.”
“The market is currently forecast to grow to US$11.5 billion by 2012 with most of that money going to the Chinese, Japanese and Korean markets. We intend to change that. When we provide high quality content, we believe a good chunk of that money plus significant new advertising funds can go to South East Asia.”
First Thailand, then Asia
Using its unique machine translation technologies, Asia Online intends to provide the content so sorely needed. Launched earlier this month (September 3,2008) the Asia Online Thai Portal is the first in a series of portals planned for the entire Asian region. On September 2, there were just 3 million pages in the Thai language on the internet - less than 0.01% of total internet pages. Asia Online has begun to progressively roll out its public portal and in the next few months will double the amount of Thai language content available online.
Most of the pages in local Thai language on the internet are meagre quality, dominated by low value content and there is poor support for Thai in existing search engines. Perhaps even more importantly, the Thai population is far more comfortable using their local language on the Internet than English.
The country’s fast growing economy has a rising middle class, prepared to increasingly use sophisticated services online and the lack of a quality translation tool between English and Thai exacerbates matters even more.
Due to the complexities of the Thai language solving Thai translation was an ideal first task, as it makes translation to other Asian languages very simple. Thai has no punctuation, no spaces between words, no end-of-sentence markers, no upper and lower case, no tense, gender or subject, and that’s just for starters as it is alphabet-based not character-based as is Chinese.
Throw in the relatively inexpensive costs of operating in Thailand and the availability (and loyalty) of a pool of talented computational linguists from several Bangkok-based Universities, and Bangkok was an ideal base for the company.
“Thailand has a unique aspect to it that no other country in the world has for languages”, says Wiggins. “To start with, you have Thai, but there are also huge Chinese, Japanese Korean and Indian industries – the core languages of Asia are all covered here.
“Then you look at Thailand’s neighbours, and you then see that Malaysia is covered here, Arabic in the south, and there’s a sizeable population here from both Indonesia and the Philippines and hence you can draw on people who are fluent in both languages.
“Then you have all the tourism that Thailand attracts, so you have a lot of people in Thailand who are fluent in French and German and Spanish and the Scandinavian languages.”
By sourcing specific skills online, Asia Online has found it has “a huge pool of people who have PhDs and Masters degrees and are ideally qualified for this translation work in specific areas”, says Wiggins.
“There is no other country in the world where such a linguistic talent base is not only readily available, but very affordable.”
The importance of knowledge
Many parts of Asia have been excluded from the digital revolution, since the people are not familiar with English, Japanese, Chinese or other major ‘tier 1’ languages that dominate web content. This failure to understand has severely limited the impact of the ICT revolution in most emerging economies.
As we repeatedly write on Gizmag, education is the only sure fire answer to the world’s problems. Education is the engine of productivity and progress. It underpins economic growth, improved social conditions, and development prospects. Providing richer and more compelling content and free access to information is essential to improving education in Asia. Hence Asia Online is the world’s largest literacy project and is currently in the process of translating tens of millions of educational, political, scientific, engineering and entertainment articles into Asian languages and making the content available through the Asia Online portal.
The goal is to put Asian countries on an equal information access footing to the rest of the world through education. Starting with the Thai-Language Portal, Asia Online is creating millions of articles covering education, politics, science, engineering, entertainment and more.
The company is translating the most important of the 1.6 million sites that have English language Open Document License (ODL) content – this represents more than 100 million pages available for translation, of which tens of millions of pages are initially slated for availability in Thai, and to a further ten markets within two years.
The best known ODL content is Wikipedia, the website with the seventh highest visitor count globally – other forms of free and licensed commercial content will also be included. The aim is to use the content to attract visitors who will generate further content through Web 2.0 features and drive online advertising revenues.
Understandably, access to the likes of Wikipedia, the CIA World Factbook and the curriculums from several leading Western universities will have an enormously positive impact on the economic and social development of the developing economies of Asia. Understanding and knowledge will reduce poverty, unleash the creative forces within the population, increase workforce productivity and create the foundations for a knowledge-based society. It is expected that the company will receive assistance from “crowd sourced” local language populations to find and correct translation errors, due to a desire to improve local language knowledge pool, perhaps even as a national initiative in some countries.
Wiggins was at pains to point out that although Asia Online has financial goals and is a commercial operation, the management team is driven by the desire to make a positive contribution to the development of emerging nations across Asia.
How Asia Online’s translation technology platform works
Asia Online’s Translation Technology is based upon more than a decade of research into statistical machine translation. It learns by being given samples of bi-lingual text. It then analyzes this text and creates statistical patterns which it uses to translate new documents. The more text samples it gets fed, the better the translation output. These slides (here, here and here) from Asia Online's investor presentation tell the story graphically.
Using an on-line sourced pool of proofreaders, the technology employs a “human-machine feedback loop” which enables the system to learn and become increasingly refined as more proofreader corrections are fed through the system.
To achieve translation quality similar to human translators requires approximately 10 million sentence pairs (3000 books). The more input data given, the more accurate the translations. As the system stands, the company is confident it has the world's most sophisticated and powerful translation technology and is already available through a “software as a service” model to language service providers and global organizations with mass translation needs.
“We have overcome the technical challenges and we are now populating the language pairings with data to build the quality” explains Wiggins.
“The way our system learns is like a child. When you were three years old, your parents didn’t say, ‘this is a noun, this is a verb, this is a subject’. You listened and patched pieces of phrases together and you learned patterns and so on. Our software learns the same way.
“The system takes a book or another document type in English and it takes the same material in Thai, and it matches sentences and it breaks those sentences apart and it takes patterns from them. The maximum pattern length is eight words, though it may be as small as one word, and it takes all those patterns and it learns, quite literally, in the same way that a child learns – ‘When I see this, I see that.’ The more information it processes, the smarter it gets.
“There’s a scoring system called BLEU – it compares against a human translator and rates against the accuracy of the translation. The typical machine translation stuff you see on the net today scores a 10 to a 20. Our baseline systems when we start out with a new language pairing begin at a 20. We have 203 language pairings scoring at least that as a baseline. It’s usable now, but in certain areas it is very good.”
“Then you add domain data. If you say to the machine ‘this is IT specific’, and throw a lot of IT translations at it, with that area’s unique acronyms and languaging, it can then translate IT very well. It will translate sport or football terribly, but if it knows a text is about information technology, it will translate it very well indeed. If you put that same topic-specific system onto an article about general news, it dies tragically. Put it onto a text about anything outside that domain and it doesn’t work as well. Our goal is to get every possible domain we can covered.”
“The more domains we cover, the better it gets. We have some domains now which are being translated extraordinarily well. French to English for legal documents is currently running at a BLEU score of 62 – a human scores around 65 on the BLEU scale, so we’re getting pretty good in that area.
“We now need to work up to nailing as many specific domains as we possibly can.”
“What you would have dealt with previously is rules based machine translation. Rules based translation involves putting a bunch of very smart linguists and very smart programmers and locking them in a room for five years.
“The problems with rules based translation is that it’s flat when the outcome is produced. If I feed it Harvard Business Review, I want the translation to read like Harvard Business Review. If I feed it an Enid Blyton childrens book, I want it to read accordingly, and I definitely don’t want it to read like Harvard Business Review or vice verca.
“With rules translation, you can’t handle that – with the statistical translation we use, we can handle that, and can stylise the output to read like a particular genre of literature. It requires data and we’re in the early stages of gathering a lot of this data right now. Every week we get much much better.”
“We already have all 23 European languages operational to a certain degree. So we can translate directly from Danish to Dutch or Greek. We’re doing the same for all the Asian languages. Currently, if you want to translate a Thai document to Vietnamese, you have to translate it to English first and then get the English translated to Vietnamese. Very shortly we’ll be able to do it directly.
“Many of the documents we are currently using to make our translation engines stronger are legacy documents from translation service providers, so we might have a book in English and the same book in Japanese. We then get that same document in Thai and feed that into the system. We then marry those translations, using the English version as the key and we then have Japanese to Thai language pairing and can work towards building its accuracy.
“Where this gets interesting is with minority languages such as Khmer that has very little data available”, Wiggins says.
The forecast for the machine translation services market
Whichever way the machine translation market is measured, its prospects are enormous. Asia Online provided us with this slide at the beginning of our discussions, which continued through the period where SMT competitor Language Weaver released its estimates of the size of the untapped translation marketplace.
The size of the human translation market today is estimated at US$14 billion according to research and consulting firm Common Sense Advisory. Because of the dramatically lower costs that automated translation technology enables, Language Weaver estimates that untapped markets for digital translation total more than US$67.5 billion.
Asia Online’s slide shows the current Global Translation Market based on Common Sense Advisory figures with adjustments based on quality improvement. The solid line data in the slide is based on Common Sense Advisory, while the dashed line data is Asia Online projections. The human translation market is currently US$14 billion with US$65 million digital, so getting up to US$67 billion of digital any time soon seems rather optimistic.
Both Language Weaver and Asia Online have different projections, but both agree that existing forecasts do not take into account new content that simply cannot be translated because of current technology limitations, and manpower, time and cost limitations.
Wiggins estimates that only 5% of what needs to be translated is currently being translated due to these factors. But while there is much that could be translated, there is still a cost associated with translation, and it must be paid for by someone. As costs lower and technology improves new content will quickly start to be translated and expand the market overall.
“Probably the most important point to make about the language translation market is the massive latent demand. This is clearly a market that is begging for a technology to help, not a technology begging for a market, which is so often the case in the internet industry.
So should human translators feel threatened by the latest development in machine translation? Not at all says Wiggins, “our model is to support the languages services industry, not to compete with them.”
“A typical human translates 2000-3000 words per day. Using our system as an adjunct, a human translator can increase their productivity to between 8000 and 10,000 words a day.”
We offer an enabling technology for this industry and one which will enable knowledge to reach people who it otherwise would simply never get to.”
Asia Online and the future
To say we were impressed when we visited Asia Online’s headquarters in Bangkok is an understatement. The technology is there to enable all Asians to join the ICT revolution and the ability for the translation technology to independently develop further language pairings based on already input data augers well for providing compelling educational content to the majority of the world’s technologically disenfranchised peoples.
The company has already developed significant intellectual property in translations systems and has established a sales pipeline and partner relationships. It has already completed founder, Angel and Series A funding (from Japan Asia Investment Corp which has previously backed Mixi.com in Japan and Alibaba.com in China), and is currently seeking strategic Series B investors.
Interested parties should contact Dion Wiggins.
The history of machine translation of languages
Like voice recognition, machine translation of languages has been a science which has promised much and delivered comparitively little to this point in time. The first patents for such a “machine” were issued in 1933 to researchers operating independently in France and Russia, but it was not until the immediate post-WWII years that the concept of using the newly invented digital computing machines to translate between languages sewed the first seeds of what we know today.
The history of viable computer-based machine translation can be traced to a memo sent to around 30 colleagues in July 1949 by Warren Weaver. Weaver, a director at the Rockefeller Foundation, put forward specific proposals for tackling the obvious problems of ambiguity (or ‘multiple meanings’), based on his knowledge of cryptography, statistics, information theory, logic and language universals.
The Weaver memorandum is probably the single most influential publication in the early days of machine translation, since it formulated goals and methods before most people had any idea of what computers might be capable of, and since it was the direct stimulus for the beginnings of research first in the United States and then later, indirectly, throughout the world.
For those interested in the history of machine translation, might we suggest that John Hutchins personal web site is the most comprehensive and authoritative we have found.
Weaver’s memorandum resulted in the research and development of the work which resulted in the first public demonstration of machine translation.
The demonstration in New York of the the Georgetown-IBM system on January 7, 1954, involved a Russian-English machine translation system and created a great deal of interest and raised public expectations of an automatic system capable of high quality translation in the short order. Although a small-scale experiment of just 250 words and six ‘grammar’ rules. This paper describes the background motivations, the linguistic methods, and the computational techniques of the system.
The final word on the subject of the history of Machine translation came from Wiggins in a throwaway line during our discussions. “To put it simply, MT has been about 50 years of MT promises – a fortnight ago we began to change all that”
Further reading on machine translation
The International Association for Machine Translation (IAMT) is comprised of three bodies, the European Association of Machine Translation (EAMT), the Association for Machine Translation in the Americas (AMTA) and the Asian-Pacific Association for Machine Translation (AAMT).
A comprehensive list of other MT links can be found here and also the resources area of the Asia Online site.
Or Login with Facebook:
Related Articles
Just enter your friends and your email address into the form below
For multiple addresses, separate each with a comma
Privacy is safe with us because we have a strict privacy policy.
Explore Gizmag








