KIT | KIT-Bibliothek | Impressum | Datenschutz

Scalable Video Action Anticipation with Cross Linear Attentive Memory

Zhong, Zeyun ORCID iD icon 1; Martin, Manuel 1; Schneider, David 1; Lerch, David J. 2; Wu, Chengzhi 1; Diederichs, Frederik; Gall, Juergen; Beyerer, Jürgen 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
2 Lichttechnisches Institut (LTI), Karlsruher Institut für Technologie (KIT)

Abstract:

Recent advances in action anticipation rely heavily on Transformer architectures to learn discriminative representations of the past observation, incurring high computational and memory overhead that limits their applicability to long videos. While temporal processors with linear complexity like RNNs and state-space models offer efficient alternatives, their sequential nature risks overlooking subtle cues in observed frames that could enhance future anticipation. We address this limitation with Cross Linear Attentive Memory (CLAM), a memory module that selectively retrieves complementary context cues from frame features. By reformulating linear attention to replace traditional cross-attention, CLAM achieves linear computation complexity and constant memory usage relative to input length. Finally, by fusing the outputs of the temporal processor and CLAM, a non-autoregressive Transformer decoder generates future actions in one shot with high accuracy. Experiments on egocentric (EpicKitchens100 and Ego4D) and third-person (Thumos14) benchmarks demonstrate our model’s superior anticipation accuracy and scalability, processing longer sequences with significantly less latency growth than alternatives. ... mehr


Originalveröffentlichung
DOI: 10.1109/WACV61042.2026.00783
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Lichttechnisches Institut (LTI)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 06.03.2026
Sprache Englisch
Identifikator ISBN: 979-8-3315-5511-5
KITopen-ID: 1000194426
Erschienen in 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Veranstaltung IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026), Tucson, AZ, USA, 06.03.2026 – 10.03.2026
Verlag Institute of Electrical and Electronics Engineers (IEEE)
Seiten 8113 - 8123
Externe Relationen Siehe auch
Nachgewiesen in Scopus
OpenAlex
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page