KIT | KIT-Bibliothek | Impressum | Datenschutz

Audio Segmentation for Robust Real-Time Speech Recognition Based on Neural Networks

Wetzel, Micha; Sperber, Matthias; Waibel, Alexander

Abstract:

Speech that contains multimedia content can pose a serious challenge for real-time automatic speech recognition (ASR) for two reasons: (1) The ASR produces meaningless output, hurting the readability of the transcript. (2) The search space of the ASR is blown up when multimedia content is encountered, resulting in large delays that compromise real-time requirements. This paper introduces a segmenter that aims to remove these problems by detecting music and noise segments in real-time and replacing them with silence. We propose a two step approach, consisting of frame classification and smoothing. First, a classifier detects speech and multimedia on the frame level. In the second step the smoothing algorithm considers the temporal context to prevent rapid class fluctuations. We investigate in frame classification and smoothing settings to obtain an appealing accuracy-latency-tradeoff. The proposed segmenter yields increases the transcript quality of an ASR system by removing on average 39 % of the errors caused by non-speech in the audio stream, while maintaining a real-time applicable delay of 270 milliseconds.


Verlagsausgabe §
DOI: 10.5445/IR/1000166278
Veröffentlicht am 22.01.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2016
Sprache Englisch
Identifikator KITopen-ID: 1000166278
Erschienen in Proceedings of the 13th International Conference on Spoken Language Translation. Ed.: M. Cettolo, J. Niehues, S. Stüker, L. Bentivogli, R. Cattoni, M. Federico
Veranstaltung 13th International Conference on Spoken Language Translation (IWSLT 2016), Seattle, WA, USA, 08.12.2016 – 09.12.2016
Verlag Association for Computational Linguistics (ACL)
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page