KIT | KIT-Bibliothek | Impressum | Datenschutz

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Wu, Chengzhi 1; Pfrommer, Julius; Zhou, Mingyuan 2; Beyerer, Jürgen 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
2 Karlsruher Institut für Technologie (KIT)

Abstract:

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel dynamic switching approach is used to cross-train two encoders with a shared decoder. The switching approach also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction and classification performance.


Verlagsausgabe §
DOI: 10.5445/IR/1000167015
Veröffentlicht am 10.01.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2023
Sprache Englisch
Identifikator ISSN: 1520-9210, 1941-0077
KITopen-ID: 1000167015
Erschienen in IEEE Transactions on Multimedia
Verlag Institute of Electrical and Electronics Engineers (IEEE)
Vorab online veröffentlicht am 05.12.2023
Schlagwörter Self-supervised learning, contrastive learning, multi-modal input, 3D shapes, dynamic switching
Nachgewiesen in Dimensions
Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page