The 2014 KIT IWSLT Speech-to-Text Systems for English, German and Italian

Kilgour, Kevin; Heck, Michael; Müller, Markus; Sperber, Matthias; Stüker, Sebastian; Waibel, Alexander


This paper describes our German, Italian and English Speech-to-Text (STT) systems for the 2014 IWSLT TED ASR track. Our setup uses ROVER and confusion network combination from various subsystems to achieve a good overall performance. The individual subsystems are built by using different front-ends, (e.g., MVDR-MFCC or lMel), acoustic models (GMM or modular DNN) and phone sets and by training on various subsets of the training data. Decoding is performed in two stages, where the GMM systems are adapted in an unsupervised manner on the combination of the first stage outputs using VTLN, MLLR, and cMLLR. The combination setup produces a final hypothesis that has a significantly lower WER than any of the individual subsystems.

Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2014
Sprache Englisch
Identifikator KITopen-ID: 1000166289
Erschienen in Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign. Ed.: M. Federico, S. Stüker, F. Yvon
Veranstaltung 11th International Workshop on Spoken Language Translation (IWSLT 2014), Lake Tahoe, NV, USA, 04.12.2014 – 05.12.2014
Verlag Association for Computational Linguistics (ACL)
Seiten 73–79

DOI: 10.5445/IR/1000166289
Veröffentlicht am 06.02.2024
