KIT | KIT-Bibliothek | Impressum | Datenschutz

Scaffolding Dexterous Manipulation with Vision-Language Models

de Bakker, Vincent; Hejna, Joey; Lum, Tyler Ga Wei; Celik, Onur; Taranovic, Aleksandar 1; Blessing, Denis; Neumann, Gerhard 1; Bohg, Jeannette; Sadigh, Dorsa
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Dexterous robotic hands are essential for performing complex manipulation tasks, yet remain difficult to train due to the challenges of demonstration collection and high-dimensional control. While reinforcement learning (RL) can alleviate the data bottleneck by generating experience in simulation, it typically relies on carefully designed, task-specific reward functions, which hinder scalability and generalization. Thus, contemporary works in dexterous manipulation have often bootstrapped from reference trajectories. These trajectories specify target hand poses that guide the exploration of RL policies and object poses that enable dense, task-agnostic rewards. However, sourcing suitable trajectories - particularly for dexterous hands - remains a significant challenge. Yet, the precise details in explicit reference trajectories are often unnecessary, as RL ultimately refines the motion. Our key insight is that modern vision-language models (VLMs) already encode the commonsense spatial and semantic knowledge needed to specify tasks and guide exploration effectively. Given a task description (e.g., "open the cabinet") and a visual scene, our method uses an off-the-shelf VLM to first identify task-relevant keypoints (e.g., handles, buttons) and then synthesize 3D trajectories for hand motion and object motion. ... mehr


Volltext §
DOI: 10.5445/IR/1000189669
Veröffentlicht am 15.01.2026
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Forschungsbericht/Preprint
Publikationsdatum 11.01.2026
Sprache Englisch
Identifikator KITopen-ID: 1000189669
Verlag arxiv
Umfang 29 S.
Schlagwörter Robotics (cs.RO)
Nachgewiesen in OpenAlex
arXiv
Dimensions
Relationen in KITopen
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page