KIT | KIT-Bibliothek | Impressum | Datenschutz

Auditing Biases: From Bias to Unfairness in LLM-Generated Explanations

Bairy, Akhila ORCID iD icon 1; Schwammberger, Maike ORCID iD icon 1
1 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)

Abstract:

Large language models (LLMs) have rapidly become central to user-facing/front-end systems across various domains. Generating understandable and trustworthy explanations for these systems is of utmost importance. Explanations are not only critical for transparency and trust, but also for improving user decision-making, emotional impact, and perceived fairness. While growing research interest has explored bias in LLM-generated responses, relatively little is known about how such biases may manifest specifically in explanations---the justifications or clarifications that models offer to users. This paper defines the notions of explanation bias and explanation unfairness. We propose a vision for auditing demographic and social biases in LLM-generated explanations using controlled scenario design, multi-dimensional evaluation, and cross-model comparison. We argue that bias might sometimes be desirable in explanations. However, biased explanations leading to explanation unfairness are undesirable.


Zugehörige Institution(en) am KIT Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2026
Sprache Englisch
Identifikator KITopen-ID: 1000193369
Erschienen in International Conference on Bridging the Gap between AI and Reality, AISoLA 2024, Crete, Greece, October 30 – November 3, 2024
Veranstaltung International Conference on Bridging the Gap between AI and Reality (AISoLA 2025), Rhodos, Griechenland, 01.11.2025 – 05.11.2025
Schlagwörter LLM-Generated Explanations, LLM Bias, Explanation Bias, Explanation Unfairness
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page