Their appears to be a breakthrough on machine language translation. Anyone who has used Babelfish knows its problems. A company with a massive dictionary and a smarter understanding is making programs that are doing better. Recent improvements had been to use a statistical analysis of good translators and learn their rules as opposed to the rules of grammar for each language.
In the most promising method to emerge from the work, called statistical-based MT, algorithms analyze large collections of previous translations, or what are technically called parallel corpora Â? sessions of the European Union, say, or newswire copy Â? to divine the statistical probabilities of words and phrases in one language ending up as particular words or phrases in another. A model is then built on those probabilities and used to evaluate new text. A slew of researchers took up IBM's insights, and by the turn of the 21st century the quality of statistical MT research systems had drawn even with five decades of rule-based work.But the new company has come up with an even better machine translation. They applied the largest bilingual dictionaries in the world and a new simple algorithm with huge databases of languages as actually used. It is a fairly simple method of getting a translation but it requires large databases and fast computers.
Since then, researchers have tweaked their algorithms and the Web has spawned an explosion of available parallel text, turning the competition into a rout. The lopsidedness is best seen in the results from the annual MT evaluation put on by the National Institute of Standards and Technology (NIST), which uses a measurement called the BiLingual Evaluation Understudy (BLEU) scale to assess a system's performance in Chinese and Arabic against human translation. A high-quality human translator will likely score between 0.7 and 0.85 out of a possible 1 on the BLEU scale. In 2005, Google's stat-based system topped the NIST evaluation in both Arabic (at 0.51) and Chinese (at 0.35). Systran, the most prominent rule-based system still in operation, languished at 0.11 for Arabic and 0.15 for Chinese.