A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation

Cho, E.; Fünfer, S.; Stüker, S.; Waibel, A.

A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation

Cho, E.; Fünfer, S.; Stüker, S.; Waibel, A.

Abstract:

With the increasing number of applications handling spontaneous speech, the needs to process spoken languages become stronger. Speech disfluency is one of the most challenging tasks to deal with in automatic speech processing. As most applications are trained with well-formed, written texts, many issues arise when processing spontaneous speech due to its distinctive characteristics. Therefore, more data with annotated speech disfluencies will help the adaptation of natural language processing applications, such as machine translation systems. In order to support this, we have annotated speech disfluencies in German lectures at KIT. In this paper we describe how we annotated the disfluencies in the data and provide detailed statistics on the size of the corpus and the speakers. Moreover, machine translation performance on a source text including disfluencies is compared to the results of the translation of a source text without different sorts of disfluencies or no disfluencies at all.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000045402

Veröffentlicht am 11.06.2025

Export

Statistiken

Seitenaufrufe: 67
seit 12.05.2018

Downloads: 23
seit 03.08.2025

Zugehörige Institution(en) am KIT	Institut für Anthropomatik und Robotik (IAR)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2014
Sprache	Englisch
Identifikator	ISBN: 978-2-9517408-8-4 KITopen-ID: 1000045402
Erschienen in	9th International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 26-31, 2014. Ed.: N. Calzolari
Veranstaltung	9th Language Resources and Evaluation Conference (LREC 2014), Reykjavík, Island, 26.05.2025 – 31.05.2025
Verlag	European Language Resources Association (ELRA)
Seiten	1554–1559
Externe Relationen	Siehe auch

Repository KITopen

A Corpus of Spontaneous Speech in Lectures: The KIT Lecture Corpus for Spoken Language Processing and Translation

Abstract: