Can we have confidence in neural machine translation?


What is neural machine translation?  Well, according to Wikipedia, “Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model”.  The stand-out phrase here is “predict the likelihood”. As a young translator, I was always under the impression that when I translated sentence A into sentence B I had to be certain that sentence B conveyed the meaning expressed in sentence A.  If I ever told my project manager that my translation was likely to convey the meaning of the original,  I would probably soon have found myself looking for a new job!

In the early days of machine translation, the translation was produced through the application of a very large number of rules. The greater the complexity of the languages involved, the more granular were the rules that governed the translation process.  In theory, if all the necessary rules were applied and all the words in the source text were contained in a bilingual custom dictionary and in a bilingual general dictionary,  you could be reasonably confident that the rule based MT system would produce an appropriate, if somewhat wooden, translation.  

Neural machine translation does not deploy a huge number of hand-crafted rules. Instead the rules enabling the model to translate from A to B, or to predict an output from an input sequence of tokens, are learned by the model from the data itself.  Having sufficient “clean data” in domains for which the neural MT system is used is half the battle when it comes to building confidence in securing an accurate translation. Of course, the model will be unable to generalise from sequences of tokens it has seen nowhere during training.  The inability of a model trained solely on the bible and other religious texts to translate simple sentences like “My child is sick, I need to see a doctor” underscores the indispensability of data pertaining to the domains or fields of human experience for which we wish to apply the model.  For developers working with “low resource” languages a lack of real-world data poses a challenge which is being met with a variety of innovative approaches.

Over the years since the appearance of rule-based MT in the early 1950s,  various metrics have been developed to measure the accuracy of machine translation systems, the best-known being the Bilingual Evaluation Understudy (BLEU) algorithm, which is probably the main starting point for developers seeking to establish just how good their systems are.   When I was a schoolboy our knowledge of Latin and Greek was put to the test by having us translate “unseen” passages from the works of classical authors.  Nobody could memorize the translations of every classical author so the test challenged us to generalise from our experience of the works of the authors on our syllabus and make a fair fist of rendering our text into English. We were expected to exercise creativity to deal with the odd unknown word in our text.   A well-trained neural machine translation model that has not “over-fitted” or simply memorized the training data will produce a varyingly successful translation of the unseen test set, as evidenced by whatever is commonly accepted as a good BLEU score.  Byte-pair encoding and other sub-word techniques will reduce the number of unknown words.  Automatic evaluation metrics such as BLEU, NIST, METEOR, WER, PER, GTM, TER and CDER help researchers and developers to determine how successfully their model has been trained.  Taking an NMT model into production in domains for which it has been trained is therefore definitely not a leap in the dark.  Then  professional translators sometimes make mistakes, and translation software makes different kinds of mistakes.  A critical eye is always needed, however the translation is produced.  To go back to our original question “Can we have confidence in neural machine translation?”,  the answer is that with reservations we probably can.

Why MyDutchPal?

You may be wondering “why MyDutchPal?” What has this business got to do with the Netherlands? Is anyone on our team Dutch?

Well, to find the answer we have to go back to 1992. Hook and Hatton, the company that owns this website, had obtained a contract to translate a huge volume of chemical specifications for the Dutch science and technology company DSM Research.  We soon realised that the set of documents comprised grammatically simple sentences that featured recurring technical terms. Our founder Terence Lewis devised a series of rules to translate these sentences from Dutch into English.  This series of rules eventually became “Trasy” the first Dutch-English machine translation program. Siemens Nederland later acquired the rights to utilise this program for their Dutch-English translations and the software was deployed to translate much of the documentation for the HSL-Zuid project – the largest railway infrastructure project in the history of the Netherlands. Of course, MyDutchPal now uses advanced neural machine translation for its Dutch-English translations which it offers as a turnkey project.

Why you need a language technology audit?

Communication is the key to successful business relationships, especially nowadays when the reality of the working world and the globalized professional market has forced a metamorphosis and language skills can no longer be taken lightly. Post-covid, physical location (city, country, or even continent) is no longer a constraint. Your employees  are in contact with  foreign colleagues and customers daily, both virtually and in person. Therefore the company’s energy should be focused on the realization, implementation, and success of the project rather than on the efforts invested and time wasted to ensure a good understanding and overcome language barriers.  

Advanced AI based language technology is a key tool for achieving these aims.   Its implementation can take the form of remote interpreting, machine translation (in the cloud or “on premises”),  enterprise chat translation or translation on a handheld device. To be effective, it is essential to diagnose the language strengths and weaknesses within the organization. A language technology audit  allows an organization to identify those areas where language technology  can assist employees in their communication with foreign colleagues and customers.  These assessments look at all a  company’s activities which involve oral or written interactions with foreign customers, partners and colleagues and highlight those areas where employees could be assisted by the use of AI based language technology. Requirements for language technology will vary from one business to the next.  A customer support organisation may benefit from a system that directly translates communications in a chat environment.  A company that needs to scan huge volumes  of data in many languages every day will be looking for a machine translation system that can process millions of words in an hour.  A scientific research organisation would be well served by a neural machine  translation system designed to translate scientific documents in a specific branch of science to a “near human quality” standard.  Requirements vary and there is an  exciting range of language technology tools to meet these requirements.  Our language technology audit is designed to identify such requirements within your organisation. You can purchase a voucher for a language technology audit in our “Knowledge Shop”.

Building machine translation models for  low resource East African languages

Downloading pretrained Hugging Face translation models, fine-tuning them with new datasets and conversion to OpenNMT’s CTranslate2 inference engine – that seems to be the most cost- and energy-effective way to build new models for low resource  language pairs where gathering data is a true treasure hunt. I’ve just fine-trained the Opus-MT Oromo-English pair. Oromo is a Cushitic language spoken by about 30 million people in Ethiopia, Kenya, Somalia and Egypt, and is the third largest language in Africa. Despite the large number of speakers, there are very few bilingual written materials in Oromo and English. I managed to pull together some three thousand new sentences from human-translated documents and fine-tuned the Opus-MT pair in both directions. This fine-tuned model has been converted into the CTranslate2 format and is now available on my free translation site at The results still leave much to be desired, but the fine-tuned model could be useful at a very basic level. For the other language widely spoken in Ethiopia – Amharic, the official language with some 25 million speakers -, I managed to gather around one million sentence pairs from a variety of sources and trained models with the OpenNMT-tf framework. Again, at the level of simple sentences, like “The army delivers clean water to all the villages in the region”, the English-Amharic model generates useful if not perfect translations, and it makes a good job of a health-related sentence like “The government is introducing measures to stop the spread of the virus”. The Opus-MT Oromo<>English models were trained on the (limited) Opus data. As I found with my Tagalog<>English experiments last year, we seem to need around one million sentence pairs to get usable translations of simple sentences. The “zero-shot” road is one on which I have yet to travel!