Semantics

Bitext Refinement

Mined bitexts can contain imperfect translations that yield unreliable training signals for Neural Machine Translation. While filtering such pairs out is known to improve final model quality, it is suboptimal in low-resource conditions where even mined data can be limited. Can we do better? How can we improve machine translation by refining its data?

Detecting Semantic Divergences At Scale

Quantifying fine-grained cross-lingual semantic divergences at scale, requires computational models that do not rely on human-labeled supervision. How can we draw on linguistics and translation theories studies to account for gold supervision?

Divergences in Machine Translation

Parallel texts—a source paired with its (human) translation—are routinely used for training machine translation systems assuming they are equivalent in meaning. Yet parallel texts might contain semantic divergences. How do those divergences interact with neural machine translation training and evaluation? How can we calibrate our assumptions to model parallel texts better?

Rationalized Semantic Divergences

Detecting fine-grained semantic divergences—small meaning differences in segments that are treated as exact translation equivalents—is a hard task even for humans. How can we prime humans to think of meaning mismatches that appear at a small granularity?