KIT | KIT-Bibliothek | Impressum | Datenschutz

Can a Large Language Model Keep My Secrets? A Study on LLM-Controlled Agents

Hemken, Niklas 1; Koneru, Sai 2; Jacob, Florian ORCID iD icon 1; Hartenstein, Hannes 1; Niehues, Jan ORCID iD icon 2
1 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)
2 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Agents controlled by Large Language Models (LLMs) can assist with natural language tasks across domains and applications when given access to confidential data. When such digital assistants interact with their potentially adversarial environment, confidentiality of the data is at stake. We investigated whether an LLM-controlled agent can, in a manner similar to humans, consider confidentiality when responding to natural language requests involving internal data. For evaluation, we created a synthetic dataset consisting of confidentiality-aware planning and deduction tasks in organizational access control. The dataset was developed from human input, LLM-generated content, and existing datasets.It includes various everyday scenarios in which access to confidential or private information is requested. We utilized our dataset to evaluate the ability to infer confidentiality-aware behavior in such scenarios by differentiating between legitimate and illegitimate access requests. We compared a prompting-based and a fine-tuning-based approach, to evaluate the performance of Llama 3 and GPT-4o-mini in this domain. In addition, we conducted a user study to establish a baseline for human evaluation performance in these tasks. ... mehr


Postprint §
DOI: 10.5445/IR/1000183438
Veröffentlicht am 11.11.2025
Originalveröffentlichung
DOI: 10.18653/v1/2025.acl-srw.49
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Publikationstyp Proceedingsbeitrag
Publikationsmonat/-jahr 07.2025
Sprache Englisch
Identifikator ISBN: 979-8-89176-254-1
KITopen-ID: 1000183438
HGF-Programm 46.23.01 (POF IV, LK 01) Methods for Engineering Secure Systems
Erschienen in Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics. Vol 4: Student Research Workshop. Ed.: J. Zhao
Veranstaltung 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Wien, Österreich, 27.07.2025 – 01.08.2025
Verlag Association for Computational Linguistics (ACL)
Seiten 746-759
Projektinformation JuBot (ZEISS-STFG, 44727 (intern))
Meetween (EU, EU 9. RP, 101135798)
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page