KIT | KIT-Bibliothek | Impressum | Datenschutz

An Evaluation of Large Language Models for Procedural Action Anticipation

Zhong, Zeyun ORCID iD icon

Abstract (englisch):

This study evaluates large language models (LLMs) for their effectiveness in
long-term action anticipation. Traditional approaches primarily depend on
representation learning from extensive video data to understand human activities, a process fraught with challenges due to the intricate nature and variability of these activities. A significant limitation of this method is the difficulty in obtaining effective video representations. Moreover, relying solely on video-based learning can restrict a model’s ability to generalize in scenarios involving long-tail classes and out-of-distribution examples. In contrast, the zero-shot or few-shot capabilities of LLMs like ChatGPT offer a novel approach to tackle the complexity of long-term activity understanding without extensive training. We propose three prompting strategies: a plain prompt, a chain-of-thought-based prompt, and an in-context learning prompt. Our experiments on the procedural Breakfast dataset indicate that LLMs can deliver promising results without specific fine-tuning.


Verlagsausgabe §
DOI: 10.5445/IR/1000190333
Veröffentlicht am 06.02.2026
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2024
Sprache Englisch
Identifikator ISBN: 978-3-7315-1351-3
KITopen-ID: 1000190333
Erschienen in Proceedings of the 2023 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory. Ed.: J. Beyerer ; T. Zander
Veranstaltung Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory (2023), Triberg, Deutschland, 30.07.2023 – 04.08.2023
Verlag Karlsruher Institut für Technologie (KIT)
Seiten 109-118
Relationen in KITopen
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page