KIT | KIT-Bibliothek | Impressum | Datenschutz

Contrastive Learning for Task-Independent SpeechLLM-Pretraining

Züfle, Maike 1; Niehues, Jan ORCID iD icon 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Large language models (LLMs) excel in natural language processing but adapting these LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine-tuning is limited by overfitting risks, data requirements, and computational costs. To address these challenges, we propose a scalable, two-stage training approach: (1) A task-independent speech pretraining stage using contrastive learning to align text and speech representations over all layers, followed by (2) a task-specific fine-tuning stage requiring minimal data. This approach outperforms traditional ASR pretraining and enables the model to surpass models specialized on speech translation and question answering while being trained on only 10% of the task-specific data.


Verlagsausgabe §
DOI: 10.5445/IR/1000184961
Veröffentlicht am 18.09.2025
Originalveröffentlichung
DOI: 10.18653/v1/2025.findings-acl.445
Dimensions
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2025
Sprache Englisch
Identifikator ISBN: 979-8-89176-256-5
KITopen-ID: 1000184961
Erschienen in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Ed.: W. Che
Veranstaltung 63th Findings of the Association for Computational Linguistics (ACL 2025 2025), Wien, Österreich, 27.07.2025 – 01.08.2025
Verlag Association for Computational Linguistics (ACL)
Seiten 8469–8490
Externe Relationen Siehe auch
Nachgewiesen in OpenAlex
Dimensions
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page