Contrastive Learning for Task-Independent SpeechLLM-Pretraining

Züfle, Maike; Niehues, Jan

doi:10.18653/v1/2025.findings-acl.445

Contrastive Learning for Task-Independent SpeechLLM-Pretraining

Züfle, Maike

¹; Niehues, Jan

¹
¹ Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Large language models (LLMs) excel in natural language processing but adapting these LLMs to speech processing tasks efficiently is not straightforward. Direct task-specific fine-tuning is limited by overfitting risks, data requirements, and computational costs. To address these challenges, we propose a scalable, two-stage training approach: (1) A task-independent speech pretraining stage using contrastive learning to align text and speech representations over all layers, followed by (2) a task-specific fine-tuning stage requiring minimal data. This approach outperforms traditional ASR pretraining and enables the model to surpass models specialized on speech translation and question answering while being trained on only 10% of the task-specific data.

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000184961

Veröffentlicht am 18.09.2025

Externe Links

Originalveröffentlichung
DOI: 10.18653/v1/2025.findings-acl.445

Dimensions
Zitationen: 1

Export

Statistiken

Seitenaufrufe: 104
seit 18.09.2025

Downloads: 164
seit 19.09.2025

Zugehörige Institution(en) am KIT	Institut für Anthropomatik und Robotik (IAR)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2025
Sprache	Englisch
Identifikator	ISBN: 979-8-89176-256-5 KITopen-ID: 1000184961
Erschienen in	Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025). Ed.: W. Che
Veranstaltung	63th Findings of the Association for Computational Linguistics (ACL 2025 2025), Wien, Österreich, 27.07.2025 – 01.08.2025
Verlag	Association for Computational Linguistics (ACL)
Seiten	8469–8490
Externe Relationen	Siehe auch
Nachgewiesen in	OpenAlex Dimensions

Repository KITopen

Contrastive Learning for Task-Independent SpeechLLM-Pretraining

Abstract: