Multilingual Evaluation

As a community, we have overfitted the characteristics of English-language data when modeling various tasks, does the same hold for our evaluation metrics? How can we evaluate natural language generation when moving multilingual?

Avatar
Eleftheria Briakou
Eleftheria Briakou

I research Multilingual NLP and Machine Translation.

Publications

GEM (@ACL) 2021