What is neural machine translation?  Well, according to Wikipedia, “Neural machine translation (NMT) is an approach to machine translation that uses an artificial neural network to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model”.  The stand-out phrase here is “predict the likelihood”. As a young translator, I was always under the impression that when I translated sentence A into sentence B I had to be certain that sentence B conveyed the meaning expressed in sentence A.  If I ever told my project manager that my translation was likely to convey the meaning of the original,  I would probably soon have found myself looking for a new job!

In the early days of machine translation, the translation was produced through the application of a very large number of rules. The greater the complexity of the languages involved, the more granular were the rules that governed the translation process.  In theory, if all the necessary rules were applied and all the words in the source text were contained in a bilingual custom dictionary and in a bilingual general dictionary,  you could be reasonably confident that the rule based MT system would produce an appropriate, if somewhat wooden, translation.  

Neural machine translation does not deploy a huge number of hand-crafted rules. Instead the rules enabling the model to translate from A to B, or to predict an output from an input sequence of tokens, are learned by the model from the data itself.  Having sufficient “clean data” in domains for which the neural MT system is used is half the battle when it comes to building confidence in securing an accurate translation. Of course, the model will be unable to generalise from sequences of tokens it has seen nowhere during training.  The inability of a model trained solely on the bible and other religious texts to translate simple sentences like “My child is sick, I need to see a doctor” underscores the indispensability of data pertaining to the domains or fields of human experience for which we wish to apply the model.  For developers working with “low resource” languages a lack of real-world data poses a challenge which is being met with a variety of innovative approaches.

Over the years since the appearance of rule-based MT in the early 1950s,  various metrics have been developed to measure the accuracy of machine translation systems, the best-known being the Bilingual Evaluation Understudy (BLEU) algorithm, which is probably the main starting point for developers seeking to establish just how good their systems are.   When I was a schoolboy our knowledge of Latin and Greek was put to the test by having us translate “unseen” passages from the works of classical authors.  Nobody could memorize the translations of every classical author so the test challenged us to generalise from our experience of the works of the authors on our syllabus and make a fair fist of rendering our text into English. We were expected to exercise creativity to deal with the odd unknown word in our text.   A well-trained neural machine translation model that has not “over-fitted” or simply memorized the training data will produce a varyingly successful translation of the unseen test set, as evidenced by whatever is commonly accepted as a good BLEU score.  Byte-pair encoding and other sub-word techniques will reduce the number of unknown words.  Automatic evaluation metrics such as BLEU, NIST, METEOR, WER, PER, GTM, TER and CDER help researchers and developers to determine how successfully their model has been trained.  Taking an NMT model into production in domains for which it has been trained is therefore definitely not a leap in the dark.  Then  professional translators sometimes make mistakes, and translation software makes different kinds of mistakes.  A critical eye is always needed, however the translation is produced.  To go back to our original question “Can we have confidence in neural machine translation?”,  the answer is that with reservations we probably can.