KIT | KIT-Bibliothek | Impressum | Datenschutz

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Wu, Chengzhi 1; Pfrommer, Julius; Zhou, Mingyuan 2; Beyerer, Jürgen 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
2 Karlsruher Institut für Technologie (KIT)

Abstract:

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel dynamic switching approach is used to cross-train two encoders with a shared decoder. The switching approach also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction and classification performance.

Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2024
Sprache Englisch
Identifikator ISSN: 1520-9210, 1941-0077
KITopen-ID: 1000167015
Erschienen in IEEE Transactions on Multimedia
Verlag Institute of Electrical and Electronics Engineers (IEEE)
Band 26
Seiten 8432–8441
Vorab online veröffentlicht am 05.12.2023
Schlagwörter Self-supervised learning, contrastive learning, multi-modal input, 3D shapes, dynamic switching
Nachgewiesen in OpenAlex
Dimensions
Scopus
Globale Ziele für nachhaltige Entwicklung Ziel 11 – Nachhaltige Städte und Gemeinden

Verlagsausgabe §
DOI: 10.5445/IR/1000167015
Veröffentlicht am 10.01.2024
Originalveröffentlichung
DOI: 10.1109/TMM.2023.3338079
Scopus
Zitationen: 2
Dimensions
Zitationen: 1
Seitenaufrufe: 109
seit 10.01.2024
Downloads: 58
seit 18.01.2024
Cover der Publikation
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page