KIT | KIT-Bibliothek | Impressum | Datenschutz

Kit-Multi: A translation-oriented multilingual embedding corpus

Ha, T.-L.; Niehues, J. ORCID iD icon 1; Sperber, M. 1; Pham, N. Q. 1; Waibel, A. 1
1 Karlsruher Institut für Technologie (KIT)

Abstract:

Cross-lingual word embeddings are the representations of words across languages in a shared continuous vector space. Cross-lingual
word embeddings have been shown to be helpful in the development of cross-lingual natural language processing tools. In case of more
than two languages involved, we call them multilingual word embeddings. In this work, we introduce a multilingual word embedding
corpus which is acquired by using neural machine translation. Unlike other cross-lingual embedding corpora, the embeddings can be
learned from significantly smaller portions of data and for multiple languages at once. An intrinsic evaluation on monolingual tasks
shows that our method is fairly competitive to the prevalent methods but on the cross-lingual document classification task, it obtains the
best figures. We are in the process to produce the embeddings for more languages, especially the languages which belong to the same
family or sematically close to each others, such as Japanese-Korean, Chinese-Vietnamese, German-Dutch, or Latin-based languagues.
Furthermore, the corpus is being analyzedd regarding its usage and usefulness in other cross-lingual tasks.


Verlagsausgabe §
DOI: 10.5445/IR/1000090649
Veröffentlicht am 02.06.2025
Scopus
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Fakultät für Informatik – Institut für Anthropomatik (IFA)
Publikationstyp Proceedingsbeitrag
Publikationsmonat/-jahr 05.2018
Sprache Englisch
Identifikator ISBN: 979-1-09-554600-9
KITopen-ID: 1000090649
Erschienen in 11th International Conference on Language Resources and Evaluation, LREC 2018; Phoenix Seagaia Conference CenterMiyazaki; Japan; 7 May 2018 through 12 May 2018. Ed.: H. Isahara
Veranstaltung 11th Language Resources and Evaluation Conference (LREC 2018), Miyazaki, Japan, 07.05.2018 – 12.05.2018
Verlag European Language Resources Association (ELRA)
Seiten 3904-3907
Externe Relationen Abstract/Volltext
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page