KIT | KIT-Bibliothek | Impressum | Datenschutz

Tight Integration of Speech Disfluency Removal into SMT

Cho, E. 1; Niehues, J. ORCID iD icon 1; Waibel, A. 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Speech disfluencies are one of the main challenges of spoken language processing. Conventional disfluency detection systems deploy a hard decision, which can have a negative influence on subsequent applications such as machine translation. In this paper we suggest a novel approach in which disfluency detection is integrated into the translation process. We train a CRF model to obtain a disfluency probability for each word. The SMT decoder will then skip the potentially disfluent word based on its disfluency probability. Using the suggested scheme, the translation score of both the manual transcript and ASR output is improved by around 0.35 BLEU points compared to the CRF hard decision system.


Verlagsausgabe §
DOI: 10.5445/IR/1000045406
Veröffentlicht am 11.06.2025
Originalveröffentlichung
DOI: 10.3115/v1/E14-4009
Scopus
Zitationen: 6
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2014
Sprache Englisch
Identifikator ISBN: 978-1-937284-99-2
KITopen-ID: 1000045406
Erschienen in EACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics : Proceedings of the Conference, April 26-30, 2014, Gothenburg, Sweden. Vol.: 2. Ed.: S. Wintner,
Veranstaltung 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014), Göteborg, Schweden, 26.04.2014 – 30.04.2014
Verlag Association for Computational Linguistics (ACL)
Seiten 43-47
Externe Relationen Siehe auch
Nachgewiesen in Dimensions
OpenAlex
Scopus
Globale Ziele für nachhaltige Entwicklung Ziel 16 – Frieden, Gerechtigkeit und starke Institutionen
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page