KIT | KIT-Bibliothek | Impressum | Datenschutz

Data Quality Challenges in Retrieval-Augmented Generation

Müller, Leopold; Holstein, Joshua ORCID iD icon 1; Bause, Sarah 1; Satzger, Gerhard ORCID iD icon 1; Kühl, Niklas ORCID iD icon
1 Karlsruhe Service Research Institute (KSRI), Karlsruher Institut für Technologie (KIT)

Abstract:

Organizations increasingly adopt Retrieval-Augmented Generation (RAG) to enhance Large Language Models with enterprise-specific knowledge. However, current data quality (DQ) frameworks have been primarily developed for static datasets, and only inadequately address the dynamic, multi-stage nature of RAG systems. This study aims to develop DQ dimensions for this new type of AI-based systems. We conduct 16 semi-structured interviews with practitioners of leading IT service companies. Through a qualitative content analysis, we inductively derive 15 distinct DQ dimensions across the four processing stages of RAG systems: data extraction, data transformation, prompt & search, and generation. Our findings reveal that (1) new dimensions have to be added to traditional DQ frameworks to also cover RAG contexts; (2) these new dimensions are concentrated in early RAG steps, suggesting the need for front-loaded quality management strategies, and (3) DQ issues transform and propagate through the RAG pipeline, necessitating a dynamic, step-aware approach to quality management.


Zugehörige Institution(en) am KIT Institut für Wirtschaftsinformatik (WIN)
Karlsruhe Service Research Institute (KSRI)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2025
Sprache Englisch
Identifikator KITopen-ID: 1000185403
Erschienen in International Conference on Information Systems (ICIS), 14th-17th December, Nashville, TN, USA
Veranstaltung 46th International Conference on Information Systems (ICIS 2025), Nashville, TN, USA, 14.12.2025 – 17.12.2025
Verlag Association for Information Systems (AIS)
Schlagwörter Retrieval-Augmented Generation, Data Quality, Large Language Models
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page