Hugging Face was founded in 2016 by Clement Delangue and Julien Chaumond. It started as a company which develops social AI-based chatbot applications, but soon moved into the field of Natural Language Processing. The company boasts a 100,000+ member community dedicated to advancing and democratizing AI through open source and open science. Members of this AI-enthused community contribute their work to what has become a huge library of NLP models. This library is now the home of the Opus-MT translation models trained by the NLP team at the University of Helsinki under the direction of Jörg Tiedemann. These models were primarily trained with the OPUS collection of translated texts from the web. In the OPUS project the developers try to convert and align free online data, to add linguistic annotation, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and the corpus is also delivered as an open content package. Many of these translation models, particularly those for African languages, are served on our free online translation site, thereby contributing to Hugging Face’s aim of achieving the democratization of AI. With the developers behind Facebook’s remarkable NLLB (“No Language Left Behind”) project we are proud to be part of this movement to extend the benefits of advanced language technology to the hundreds of millions of people in Africa, Asia and the Americas whose languages – many ancient and sophisticated – have until now been overlooked by MT developers.
The Opus-MT models downloaded from Hugging Face are what are known as “pre-trained” models. They are capable of providing translations of simple sentences and their range is greatly dependent on the coverage of the data with which they are trained. They benefit from “fine tuning” which involves them being exposed to specialist datasets. A major obstacle to this endeavour is that such datasets are lacking for many of the languages called “low resource” languages despite being spoken by many millions of people. Religious texts have been the sole resource for developers At MyDutchPal we devote considerable time to tracking down data in the form of parallel texts which we can use to fine-tune our models. Amharic, Igbo and Luganda are three of the languages for which we have been able to acquire additional data, but many of the African languages available on our site are still only able to translate simple sentences. Despite these limitations we have decided to make them available on our free translation site in the hope that they will be of some use to speakers of these languages and those wishing to communicate with them. Our online translation site is called “NMT Gateway” and it is intended to be a gateway to understanding. The aims of MyDutchPal and Hugging Face are not that different.