KIT | KIT-Bibliothek | Impressum | Datenschutz

A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation

Cho, E.; Fünfer, S.; Stüker, S.; Waibel, A.

Abstract:

With the increasing number of applications handling spontaneous speech, the needs to process spoken languages become stronger. Speech disfluency is one of the most challenging tasks to deal with in automatic speech processing. As most applications are trained with well-formed, written texts, many issues arise when processing spontaneous speech due to its distinctive characteristics. Therefore, more data with annotated speech disfluencies will help the adaptation of natural language processing applications, such as machine translation systems. In order to support this, we have annotated speech disfluencies in German lectures at KIT. In this paper we describe how we annotated the disfluencies in the data and provide detailed statistics on the size of the corpus and the speakers. Moreover, machine translation performance on a source text including disfluencies is compared to the results of the translation of a source text without different sorts of disfluencies or no disfluencies at all.


Verlagsausgabe §
DOI: 10.5445/IR/1000045402
Veröffentlicht am 11.06.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2014
Sprache Englisch
Identifikator ISBN: 978-2-9517408-8-4
KITopen-ID: 1000045402
Erschienen in 9th International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 26-31, 2014. Ed.: N. Calzolari
Veranstaltung 9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Island, 26.05.2025 – 31.05.2025
Verlag European Language Resources Association (ELRA)
Seiten 1554–1559
Externe Relationen Siehe auch
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page