KIT | KIT-Bibliothek | Impressum | Datenschutz

@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology

Jiang, Xin 1; Zheng, Junwei 1; Liu, Ruiping 1; Li, Jiahang 1; Zhang, Jiaming 2; Matthiesen, Sven 3; Stiefelhagen, Rainer ORCID iD icon 2
1 Karlsruher Institut für Technologie (KIT)
2 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
3 Institut für Produktentwicklung (IPEK), Karlsruher Institut für Technologie (KIT)

Abstract (englisch):

As Vision-Language Models (VLMs) advance, human-centered Assistive Technologies (ATs) for helping People with Visual Impairments (PVIs) are evolving into generalists, capable of performing multiple tasks simultaneously. However, benchmarking VLMs for ATs remains under-explored. To bridge this gap, we first create a novel AT benchmark (@ Bench). Guided by a pre-design user study with PVIs, our benchmark includes the five most crucial vision-language tasks: Panoptic Segmentation, Depth Estimation, Optical Character Recognition (OCR), Image Captioning, and Visual Question Answering (VQA). Besides, we propose a novel AT model (@MODEL) that addresses all tasks simultaneously and can be expanded to more assistive functions for helping PVIs. Our framework exhibits outstanding performance across tasks by integrating multi-modal information, and it offers PVIs a more comprehensive assistance. Extensive experiments prove the effectiveness and generalizability of our framework.


Originalveröffentlichung
DOI: 10.1109/WACV61041.2025.00387
Scopus
Zitationen: 1
Dimensions
Zitationen: 1
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Institut für Produktentwicklung (IPEK)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 26.02.2025
Sprache Englisch
Identifikator ISBN: 979-83-315-1083-1
KITopen-ID: 1000182093
Erschienen in 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Veranstaltung IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2025), Tucson, AZ, USA, 28.02.2025 – 04.03.2025
Verlag Institute of Electrical and Electronics Engineers (IEEE)
Seiten 3934–3943
Schlagwörter vlm; assistive technology; panoptic segmentation; depth estimation; ocr; image captioning; vqa
Nachgewiesen in Dimensions
OpenAlex
Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page