KIT | KIT-Bibliothek | Impressum | Datenschutz

Machine Translation from Standard German to Alemannic Dialects

Lambrecht, L. 1; Schneider, F. 2; Waibel, A. 2
1 Karlsruher Institut für Technologie (KIT)
2 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Machine translation has been researched using deep neural networks in recent years. These networks require lots of data to learn abstract representations of the input stored in continuous vectors. Dialect translation has become more important since the advent of social media. In particular, when dialect speakers and standard language speakers no longer understand each other, machine translation is of rising concern. Usually, dialect translation is a typical low-resourced language setting facing data scarcity problems. Additionally, spelling inconsistencies due to varying pronunciations and the lack of spelling rules complicate translation. This paper presents the best-performing approaches to handle these problems for Alemannic dialects. The results show that back-translation and conditioning on dialectal manifestations achieve the most remarkable enhancement over the baseline. Using back-translation, a significant gain of +4.5 over the strong transformer baseline of 37.3 BLEU points is accomplished. Differentiating between several Alemannic dialects instead of treating Alemannic as one dialect leads to substantial improvements: Multi-dialectal translation surpasses the baseline on the dialectal test sets. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000155240
Veröffentlicht am 27.01.2023
Scopus
Zitationen: 4
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2022
Sprache Englisch
Identifikator KITopen-ID: 1000155240
Erschienen in Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Veranstaltung 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022), Marseille, Frankreich, 24.06.2023 – 25.06.2023
Verlag Association for Computational Linguistics (ACL)
Seiten 129-136
Externe Relationen Abstract/Volltext
Schlagwörter machine translation, low-resource languages, dialect
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page