The Deduplicator program provides a user-friendly way to remove duplicates from a parallel corpus. Duplicates can badly skew training and performance results in Neural Machine Translation and other language processing tasks. The program ensures that they are removed simultaneously from source and target data.


