On the Various Semantics of Similarity in Word Embedding Models

Elekes, Ábel; Schäler, Martin; Böhm, Klemens

doi:10.5445/IR/1000065330

On the Various Semantics of Similarity in Word Embedding Models

Elekes, Ábel; Schäler, Martin; Böhm, Klemens

Abstract:

Finding similar words with the help of word embedding models has yielded meaningful results in many cases. However, the no-tion of similarity has remained ambiguous. In this paper, we examine when exactly similarity values in word embedding mod-els are meaningful. To do so, we analyze the statistical distribu-tion of similarity values systematically, in two series of experi-ments. The first one examines how the distribution of similarity values depends on the different embedding-model algorithms and parameters. The second one starts by showing that intuitive simi-larity thresholds do not exist. We then propose a method stating which similarity values actually are meaningful for a given em-bedding model. In more abstract terms, our insights should give way to a better understanding of the notion of similarity in em-bedding models and to more reliable evaluations of such models.

KITopen-Download

Volltext

DOI: 10.5445/IR/1000065330

Export

Statistiken

Seitenaufrufe: 190
seit 05.05.2018

Downloads: 964
seit 03.02.2017

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Forschungsbericht/Preprint
Publikationsjahr	2017
Sprache	Englisch
Identifikator	ISSN: 2190-4782 urn:nbn:de:swb:90-653309 KITopen-ID: 1000065330
Verlag	Karlsruher Institut für Technologie (KIT)
Serie	Karlsruhe Reports in Informatics ; 2017,3
Schlagwörter	Word embedding models; similarity values; semantic similarity

Repository KITopen

On the Various Semantics of Similarity in Word Embedding Models

Abstract: