KIT | KIT-Bibliothek | Impressum | Datenschutz

Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages

Mullov, Carlos 1; Pham, Quan; Waibel, Alexander 1
1 Karlsruher Institut für Technologie (KIT)

Abstract:

Multilingual neural machine translation systems learn to map sentences of different lan-
guages into a common representation space. Intuitively, with a growing number of seen
languages the encoder sentence representation grows more flexible and easily adaptable to
new languages. In this work, we test this hypothesis by zero-shot translating from unseen
languages. To deal with unknown vocabularies from unknown languages we propose a
setup where we decouple learning of vocabulary and syntax, i.e. for each language we
learn word representations in a separate step (using cross-lingual word embeddings), and
then train to translate while keeping those word representations frozen. We demonstrate that
this setup enables zero-shot translation from entirely unseen languages. Zero-shot translat-
ing with a model trained on Germanic and Romance languages we achieve scores of 42.6
BLEU for Portuguese-English and 20.7 BLEU for Russian-English on TED domain. We ex-
plore how this zero-shot translation capability develops with varying number of languages
seen by the encoder. Lastly, we explore the effectiveness of our decoupled learning strat-
... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000174872
Veröffentlicht am 09.10.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsmonat/-jahr 08.2024
Sprache Englisch
Identifikator ISBN: 979-889176094-3
KITopen-ID: 1000174872
Erschienen in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Ed.: L. Ku, A. Martins, V. Srikumar
Veranstaltung 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024), Bankok, Thailand, 11.08.2024 – 16.08.2024
Verlag Association for Computational Linguistics (ACL)
Seiten 6693–6709
Serie 1
Nachgewiesen in Scopus
Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page