Mined bitexts can contain imperfect translations that yield unreliable training signals for Neural Machine Translation. While filtering such pairs out is known to improve final model quality, it is suboptimal in low-resource conditions where even mined data can be limited. Can we do better? How can we improve machine translation by refining its data?
Bitext Refinement
Publications
BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation
NAACL-Findings 2022Can Synthetic Translations Improve Bitext Quality?
ACL 2022
Eleftheria Briakou, Marine Carpuat