Parallel texts—a source paired with its (human) translation—are routinely used for training machine translation systems assuming they are equivalent in meaning. Yet parallel texts might contain semantic divergences. How do those divergences interact with neural machine translation training and evaluation? How can we calibrate our assumptions to model parallel texts better?