KIT | KIT-Bibliothek | Impressum | Datenschutz

Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments

Wen, Di ORCID iD icon 1; Qi, Lei; Peng, Kunyu ORCID iD icon 1; Yang, Kailun 1; Teng, Fei; Luo, Ao; Fu, Jia; Chen, Yufan; Liu, Ruiping; Shi, Yitian ORCID iD icon 2; Sarfraz, M. Saquib; Stiefelhagen, Rainer ORCID iD icon 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
2 Institut für Fördertechnik und Logistiksysteme (IFL), Karlsruher Institut für Technologie (KIT)

Abstract:

Despite substantial progress in video understanding, most existing datasets are limited to Earth's gravitational conditions. However, microgravity alters human motion, interactions, and visual semantics, revealing a critical gap for real-world vision systems. This presents a challenge for domain-robust video understanding in safety-critical space applications. To address this, we introduce MicroG-4M, the first benchmark for spatio-temporal and semantic understanding of human activities in microgravity. Constructed from real-world space missions and cinematic simulations, the dataset includes 4,759 clips covering 50 actions, 1,238 context-rich captions, and over 7,000 question-answer pairs on astronaut activities and scene understanding. MicroG-4M supports three core tasks: fine-grained multi-label action recognition, temporal video captioning, and visual question answering, enabling a comprehensive evaluation of both spatial localization and semantic reasoning in microgravity contexts. We establish baselines using state-of-the-art models. All data, annotations, and code are available at https://github.com/LEI-QI-233/HAR-in-Space.


Volltext §
DOI: 10.5445/IR/1000189821
Veröffentlicht am 21.01.2026
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Institut für Fördertechnik und Logistiksysteme (IFL)
Publikationstyp Forschungsbericht/Preprint
Publikationsdatum 13.10.2025
Sprache Englisch
Identifikator KITopen-ID: 1000189821
Verlag arxiv
Serie Computer Science - Computer Vision and Pattern Recognition
Vorab online veröffentlicht am 03.06.2025
Schlagwörter Computer Vision and Pattern Recognition (cs.CV)
Nachgewiesen in OpenAlex
arXiv
Dimensions
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page