KIT | KIT-Bibliothek | Impressum | Datenschutz

Exploring Crosslingual Word Embeddings for Semantic Classification in Text and Dialogue

Vsesviatska, Oleksandra

Abstract:

Current approaches to learning crosslingual word emebeddings provide a decent performance when based on a big amount of parallel data. Considering the fact, that most of the languages are under-resourced and lack structured lexical materials, it makes it difficult to implement them into such methods, and, respectively, into any human language technologies. In this thesis we explore whether crosslingual mapping between two sets of monolingual word embeddings obtained separately is strong enough to present competitive results on semantic classification tasks. Our experiment involves learning crosslingual transfer between German and French word vectors based on the combination of adversarial approach and the Procrustes algorithm. We evaluate embeddings on topic classification, sentiment analysis and humour detection tasks. We use a German subset of a multilingual data set for training, and a French subset for testing our models. Results across German and French languages prove that word vectors mapped into a shared vector space are able to obtain and transfer semantic information from one language to another successfully. We also show that crosslingual mapping does not weaken the monolingual connections between words in one language.


Volltext §
DOI: 10.5445/IR/1000117950
Veröffentlicht am 31.03.2020
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Hochschulschrift
Publikationsdatum 09.07.2019
Sprache Englisch
Identifikator KITopen-ID: 1000117950
Verlag Universität
Umfang 78 S.
Art der Arbeit Abschlussarbeit - Master
Prüfungsdaten Bielefeld, Univ., Masterarbeit, 2019
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page