KIT | KIT-Bibliothek | Impressum | Datenschutz

PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning

Jia, Xiaogang; Wang, Qian; Wang, Anrui; Wang, Han A.; Gyenes, Balázs; Gospodinov, Emiliyan; Jiang, Xinkai; Li, Ge ORCID iD icon 1; Zhou, Hongyi; Liao, Weiran; Huang, Xi; Beck, Maximilian ORCID iD icon 2; Reuss, Moritz 1; Lioutikov, Rudolf; Neumann, Gerhard 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
2 Institut für Technik der Informationsverarbeitung (ITIV), Karlsruher Institut für Technologie (KIT)

Abstract:

Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. ... mehr


Volltext §
DOI: 10.5445/IR/1000189672
Veröffentlicht am 15.01.2026
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Institut für Technik der Informationsverarbeitung (ITIV)
Publikationstyp Forschungsbericht/Preprint
Publikationsdatum 26.11.2025
Sprache Englisch
Identifikator KITopen-ID: 1000189672
Verlag arxiv
Umfang 23 S.
Schlagwörter Robotics (cs.RO), Machine Learning (cs.LG)
Nachgewiesen in OpenAlex
arXiv
Dimensions
Relationen in KITopen
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page