KIT | KIT-Bibliothek | Impressum | Datenschutz

Diffusion-based Cumulative Adversarial Purification for Vision Language Models

Fu, Jia; Wu, Yongtao; Chen, Yihang; Peng, Kunyu ORCID iD icon 1; Zhang, Xiao; Cevher, Volkan; Pashami, Sepideh; Holst, Anders
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Vision Language Models (VLMs) have shown remarkable capabilities in multimodal under-standing, yet their susceptibility to adversarial perturbations poses a significant threat to their reliability in real-world applications. Despite often being imperceptible to humans, these perturbations can drastically alter model outputs, leading to erroneous interpretations and decisions. This paper introduces DiffCAP, a novel diffusion-based purification strategy that can effectively neutralize adversarial corruptions in VLMs. We theoretically establish a provable recovery region in the forward diffusion process and meanwhile quantify the convergence rate of semantic variation with respect to VLMs. These findings manifest that adversarial effects monotonically fade as diffusion unfolds. Guided by this principle, DiffCAP leverages noise injection with a similarity threshold of VLM embeddings as an adaptive criterion, before reverse diffusion restores a clean and reliable representation for VLM inference. Through extensive experiments across six datasets with three VLMs under varying attack strengths in three task scenarios, we show that DiffCAP outperforms existing defense techniques by a substantial margin. ... mehr


Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2026
Sprache Englisch
Identifikator ISSN: 2835-8856
KITopen-ID: 1000194674
Erschienen in Transactions on Machine Learning Research
Verlag OpenReview.net
Band 2026-June
Seiten 1
Bemerkung zur Veröffentlichung in press
Externe Relationen Siehe auch
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page