KIT | KIT-Bibliothek | Impressum | Datenschutz

DECM: Evaluating Bilingual ASR Performance on a Code-switching/mixing Benchmark

Ugan, Enes Yavuz 1; Pham, Ngoc-Quan 1; Waibel, Alexander
1 Karlsruher Institut für Technologie (KIT)

Abstract:

Automatic Speech Recognition has made significant progress, but challenges persist. Code-switched (CSW) Speech
presents one such challenge, involving the mixing of multiple languages by a speaker. Even when multilingual ASR
models are trained, each utterance on its own usually remains monolingual. We introduce an evaluation dataset for
German-English CSW, with German as the matrix language and English as the embedded language. The dataset
comprises spontaneous speech from diverse domains, enabling realistic CSW evaluation in German-English. It
includes splits with varying degrees of CSW to facilitate specialized model analysis. As it is difficult to collect CSW
data for all language pairs, the provision of such evaluation data, is crucial for developing and analyzing ASR models
capable of generalizing across unseen pairs. Detailed data statistics are presented, and state-of-the-art (SOTA)
multilingual models are evaluated showing challanges of CSW speech.


Verlagsausgabe §
DOI: 10.5445/IR/1000172045
Veröffentlicht am 27.06.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Zentrum für Mediales Lernen (ZML)
Publikationstyp Proceedingsbeitrag
Publikationsmonat/-jahr 05.2024
Sprache Englisch
Identifikator ISBN: 978-2-493-81410-4
KITopen-ID: 1000172045
Erschienen in 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, Hybrid, Torino, 20th-25th May 2024
Veranstaltung Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italien, 20.05.2024 – 25.05.2024
Verlag European Language Resources Association (ELRA)
Seiten 4468 – 4475
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page