The Deduplicator program provides a user-friendly way to remove duplicates from a parallel corpus. Duplicates can badly skew training and performance results in Neural Machine Translation and other language processing tasks. The program ensures that they are removed simultaneously from source and target data.


There are no reviews yet.

Be the first to review “Deduplicator”

Your email address will not be published. Required fields are marked *