Exploring Crosslingual Word Embeddings for Semantic Classification in Text and Dialogue

Vsesviatska, Oleksandra

doi:10.5445/IR/1000117950

Exploring Crosslingual Word Embeddings for Semantic Classification in Text and Dialogue

Vsesviatska, Oleksandra

Abstract:

Current approaches to learning crosslingual word emebeddings provide a decent performance when based on a big amount of parallel data. Considering the fact, that most of the languages are under-resourced and lack structured lexical materials, it makes it difficult to implement them into such methods, and, respectively, into any human language technologies. In this thesis we explore whether crosslingual mapping between two sets of monolingual word embeddings obtained separately is strong enough to present competitive results on semantic classification tasks. Our experiment involves learning crosslingual transfer between German and French word vectors based on the combination of adversarial approach and the Procrustes algorithm. We evaluate embeddings on topic classification, sentiment analysis and humour detection tasks. We use a German subset of a multilingual data set for training, and a French subset for testing our models. Results across German and French languages prove that word vectors mapped into a shared vector space are able to obtain and transfer semantic information from one language to another successfully. We also show that crosslingual mapping does not weaken the monolingual connections between words in one language.

Zugehörige Institution(en) am KIT	Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp	Hochschulschrift
Publikationsdatum	09.07.2019
Sprache	Englisch
Identifikator	KITopen-ID: 1000117950
Verlag	Universität
Umfang	78 S.
Art der Arbeit	Abschlussarbeit - Master
Prüfungsdaten	Bielefeld, Univ., Masterarbeit, 2019
Nachgewiesen in	OpenAlex
Globale Ziele für nachhaltige Entwicklung

KITopen-Download

Volltext

DOI: 10.5445/IR/1000117950

Veröffentlicht am 31.03.2020

Export

Statistiken

Seitenaufrufe: 150
seit 31.03.2020

Downloads: 115
seit 01.04.2020

Repository KITopen

Exploring Crosslingual Word Embeddings for Semantic Classification in Text and Dialogue

Abstract: