Unsupervised Multi-Topic Labeling for Spoken Utterances

Weigelt, Sebastian; Keim, Jan; Hey, Tobias; Tichy, Walter F.

doi:10.1109/HCC46620.2019.00014

Unsupervised Multi-Topic Labeling for Spoken Utterances

Weigelt, Sebastian ¹; Keim, Jan

¹; Hey, Tobias

¹; Tichy, Walter F.

¹
¹ Institut für Programmstrukturen und Datenorganisation (IPD), Karlsruher Institut für Technologie (KIT)

Abstract:

Systems such as Alexa, Cortana, and Siri appear rather smart. However, they only react to predefined wordings and do not actually grasp the user's intent. To overcome this limitation, a system must grasp the topics the user is talking about. Therefore, we apply unsupervised multi-topic labeling to spoken utterances. Although topic labeling is a well-studied task on textual documents, its potential for spoken input is almost unexplored. Our approach for topic labeling is tailored to spoken utterances; it copes with short and ungrammatical input. The approach is two-tiered. First, we disambiguate word senses. We utilize Wikipedia as pre-labeled corpus to train a naïve-bayes classifier. Second, we build topic graphs based on DBpedia relations. We use two strategies to determine central terms in the graphs, i.e. the shared topics. One focuses on the dominant senses in the utterance and the other covers as many distinct senses as possible. Our approach creates multiple distinct topics per utterance and ranks results. The evaluation shows that the approach is feasible; the word sense disambiguation achieves a recall of 0.799. Concerning topic labeling, in a user study subjects assessed that in 90.9% of the cases at least one proposed topic label among the first four is a good fit. ... mehr

KITopen-Download

Postprint

DOI: 10.5445/IR/1000105171

Veröffentlicht am 24.05.2024

Externe Links

Originalveröffentlichung
DOI: 10.1109/HCC46620.2019.00014

Scopus
Zitationen: 1

Dimensions
Zitationen: 1

Export

Statistiken

Seitenaufrufe: 429
seit 23.01.2020

Downloads: 157
seit 07.06.2024

Zugehörige Institution(en) am KIT	Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp	Proceedingsbeitrag
Publikationsmonat/-jahr	09.2019
Sprache	Englisch
Identifikator	ISBN: 978-1-72814-125-1 KITopen-ID: 1000105171
Erschienen in	2019 IEEE International Conference on Humanized Computing and Communication (HCC)
Veranstaltung	IEEE International Conference on Humanized Computing and Communication (HCC 2019), Laguna Hills, CA, USA, 25.09.2019 – 27.09.2019
Verlag	Institute of Electrical and Electronics Engineers (IEEE)
Seiten	38-45
Schlagwörter	Topic Labeling, Topic Modeling, Unsupervised Machine Learning, Graph Centrality Measures, Word Sense Disambiguation, DBpedia, Wikipedia, Semantic Annotation, Spoken Language Interfaces, Spoken Language Understanding, Natural Language Processing, Natural Language Understanding
Nachgewiesen in	Dimensions Scopus OpenAlex
Globale Ziele für nachhaltige Entwicklung

Repository KITopen

Unsupervised Multi-Topic Labeling for Spoken Utterances

Abstract: