Bitext Refinement

Mined bitexts can contain imperfect translations that yield unreliable training signals for Neural Machine Translation. While filtering such pairs out is known to improve final model quality, it is suboptimal in low-resource conditions where even mined data can be limited. Can we do better? How can we improve machine translation by refining its data?

Avatar
Eleftheria Briakou
Eleftheria Briakou

I research Multilingual NLP and Machine Translation.

Publications