KIT | KIT-Bibliothek | Impressum | Datenschutz

Visual Imitation Learning of Manipulation Tasks for Humanoid Robots

Gao, Jianfeng ORCID iD icon 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract (englisch):

Observational learning is a fundamental mechanism by which humans acquire new skills by watching others and understanding the consequences of their actions. This capability allows for skill acquisition through demonstration, thereby reducing the need for costly trial-and-error processes. Cognitive development research has shown that infants can learn complex skills and make inductive generalizations from sparse samples by observing caregivers and peers; they leverage statistical evidence that models the covariation of task features, all without direct physical interaction or explicit linguistic instructions. By identifying invariant task features -- such as keypoints associated with an object's functional parts -- from high-dimensional visual inputs, it is possible to derive effective and transferable task representations. These insights have motivated significant research in robotics to develop Visual Imitation Learning (VIL) systems that emulate human observational learning mechanisms. Nevertheless, acquiring generalizable task representations solely from sparse human demonstration videos remains a significant challenge.

In this thesis, we adopt a bottom-up approach that extracts essential invariant task features from demonstrations without relying on ground-truth labels, direct physical interaction, or linguistic bootstrapping commonly employed in top-down methodologies. ... mehr


Volltext §
DOI: 10.5445/IR/1000190480
Veröffentlicht am 23.02.2026
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Hochschulschrift
Publikationsdatum 23.02.2026
Sprache Englisch
Identifikator KITopen-ID: 1000190480
Verlag Karlsruher Institut für Technologie (KIT)
Umfang XVI, 246 S.
Art der Arbeit Dissertation
Fakultät Fakultät für Informatik (INFORMATIK)
Institut Institut für Anthropomatik und Robotik (IAR)
Prüfungsdatum 08.05.2025
Schlagwörter visual imitation learning; humanoid robot; bimanual manipulation; geometric constraints; keypoints; coordination; motion segmentation
Relationen in KITopen
Referent/Betreuer Asfour, Tamim
Toussaint, Marc
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page