This tool provides an easy way to select sentences from a parallel corpus based on minimum and maximum lengths. Having too many sentences that are too short or too long can badly skew the training of NMT models. The tool also removes duplicates from the corpus.
Reviews
There are no reviews yet.