Auditing Biases: From Bias to Unfairness in LLM-Generated Explanations

Bairy, Akhila; Schwammberger, Maike

Auditing Biases: From Bias to Unfairness in LLM-Generated Explanations

Bairy, Akhila

¹; Schwammberger, Maike

¹
¹ Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)

Abstract:

Large language models (LLMs) have rapidly become central to user-facing/front-end systems across various domains. Generating understandable and trustworthy explanations for these systems is of utmost importance. Explanations are not only critical for transparency and trust, but also for improving user decision-making, emotional impact, and perceived fairness. While growing research interest has explored bias in LLM-generated responses, relatively little is known about how such biases may manifest specifically in explanations---the justifications or clarifications that models offer to users. This paper defines the notions of explanation bias and explanation unfairness. We propose a vision for auditing demographic and social biases in LLM-generated explanations using controlled scenario design, multi-dimensional evaluation, and cross-model comparison. We argue that bias might sometimes be desirable in explanations. However, biased explanations leading to explanation unfairness are undesirable.

Export

Statistiken

Seitenaufrufe: 74
seit 19.05.2026

Zugehörige Institution(en) am KIT	Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2026
Sprache	Englisch
Identifikator	KITopen-ID: 1000193369
Erschienen in	International Conference on Bridging the Gap between AI and Reality, AISoLA 2024, Crete, Greece, October 30 – November 3, 2024
Veranstaltung	International Conference on Bridging the Gap between AI and Reality (AISoLA 2025), Rhodos, Griechenland, 01.11.2025 – 05.11.2025
Schlagwörter	LLM-Generated Explanations, LLM Bias, Explanation Bias, Explanation Unfairness

Repository KITopen

Auditing Biases: From Bias to Unfairness in LLM-Generated Explanations

Abstract: