What’s the Matter? Knowledge Acquisition by Unsupervised Multi-Topic Labeling for Spoken Utterances

Weigelt, Sebastian; Keim, Jan; Hey, Tobias; Tichy, Walter F.

doi:10.35708/HCC1868-126364

What’s the Matter? Knowledge Acquisition by Unsupervised Multi-Topic Labeling for Spoken Utterances

Weigelt, Sebastian; Keim, Jan

; Hey, Tobias

; Tichy, Walter F.

Abstract (englisch):

Systems such as Alexa, Cortana, and Siri app ear rather smart. However, they only react to predefined wordings and do not actually grasp the user's intent. To overcome this limitation, a system must understand the topics the user is talking about. Therefore, we apply unsupervised multi-topic labeling to spoken utterances. Although topic labeling is a well-studied task on textual documents, its potential for spoken input is almost unexplored. Our approach for topic labeling is tailored to spoken utterances; it copes with short and ungrammatical input.
The approach is two-tiered. First, we disambiguate word senses. We utilize Wikipedia as pre-labeled corpus to train a naïve-bayes classifier. Second, we build topic graphs based on DBpedia relations. We use two strategies to determine central terms in the graphs, i.e. the shared topics. One fo cuses on the dominant senses in the utterance and the other covers as many distinct senses as possible. Our approach creates multiple distinct topics per utterance and ranks results.
The evaluation shows that the approach is feasible; the word sense disambiguation achieves a recall of 0.799. ... mehr

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Zeitschriftenaufsatz
Publikationsmonat/-jahr	08.2020
Sprache	Englisch
Identifikator	ISSN: 2641-953X KITopen-ID: 1000140308
Erschienen in	International Journal of Humanized Computing and Communication
Verlag	Institute for Semantic Computing Foundation
Band	1
Heft	1
Seiten	43–66
Vorab online veröffentlicht am	01.08.2020
Schlagwörter	Topic Labeling, Topic Modeling, Unsupervised Machine Learning, Graph Centrality Measures, Word Sense Disambiguation, Ontology Selection, DBpedia, Wikipedia, Semantic Annotation, Spoken Language Interfaces, Spoken Language Understanding, Natural Language Processing
Nachgewiesen in	Dimensions OpenAlex
Globale Ziele für nachhaltige Entwicklung

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000140308

Veröffentlicht am 29.11.2021

Externe Links

Originalveröffentlichung
DOI: 10.35708/HCC1868-126364

Dimensions

Export

Statistiken

Seitenaufrufe: 181
seit 26.11.2021

Downloads: 131
seit 29.11.2021

Repository KITopen

What’s the Matter? Knowledge Acquisition by Unsupervised Multi-Topic Labeling for Spoken Utterances

Abstract (englisch):