KIT | KIT-Bibliothek | Impressum | Datenschutz

Eliciting natural speech from non-native users: collecting speech data for LVCSR

Tomokiyo Mayfield, Laura; Burger, Susanne

Abstract:

In this paper, we discuss the design of a database of recorded and transcribed read and spontaneous speech of semi-fluent, stronglyaccented non-native speakers of English. While many speech applications work best with a recognizer that expects native-like usage, others could benefit from a speech recognition component that is forgiving of the sorts of errors that are not a barrier to communication; in order to train such a recognizer a database of non-native speech is needed. We examine how collecting data from non-native speakers must necessarily differ from collection from native speakers, and describe work we did to develop an appropriate scenario, recording setup, and optimal surroundings during recording.


Postprint §
DOI: 10.5445/IR/336699
Veröffentlicht am 28.07.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Fakultät für Informatik – Institut für Logik, Komplexität und Deduktionssysteme (ILKD)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 1999
Sprache Englisch
Identifikator KITopen-ID: 336699
Erschienen in Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, ACL 1999, College Park, Maryland, USA, June 20-26, 1999
Veranstaltung 37th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park, MD, USA, 20.06.1999 – 26.06.1999
Verlag Association for Computational Linguistics (ACL)
Erscheinungsvermerk In: Proceedings of ACL-99 Workshop on Computer Mediated Language Assessment and Evaluation in NLP, College Park, MD 1999.
Externe Relationen Abstract/Volltext
Siehe auch
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page