Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Wu, Chengzhi; Pfrommer, Julius; Zhou, Mingyuan; Beyerer, Jürgen

doi:10.1109/TMM.2023.3338079

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Wu, Chengzhi ¹; Pfrommer, Julius; Zhou, Mingyuan ²; Beyerer, Jürgen ¹
¹ Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
² Karlsruher Institut für Technologie (KIT)

Abstract:

We propose a combined generative and contrastive neural architecture for learning latent representations of 3D volumetric shapes. The architecture uses two encoder branches for voxel grids and multi-view images from the same underlying shape. The main idea is to combine a contrastive loss between the resulting latent representations with an additional reconstruction loss. That helps to avoid collapsing the latent representations as a trivial solution for minimizing the contrastive loss. A novel dynamic switching approach is used to cross-train two encoders with a shared decoder. The switching approach also enables the stop gradient operation on a random branch. Further classification experiments show that the latent representations learned with our self-supervised method integrate more useful information from the additional input data implicitly, thus leading to better reconstruction and classification performance.

Zugehörige Institution(en) am KIT	Institut für Anthropomatik und Robotik (IAR)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2024
Sprache	Englisch
Identifikator	ISSN: 1520-9210, 1941-0077 KITopen-ID: 1000167015
Erschienen in	IEEE Transactions on Multimedia
Verlag	Institute of Electrical and Electronics Engineers (IEEE)
Band	26
Seiten	8432–8441
Vorab online veröffentlicht am	05.12.2023
Schlagwörter	Self-supervised learning, contrastive learning, multi-modal input, 3D shapes, dynamic switching
Nachgewiesen in	OpenAlex Dimensions Scopus
Globale Ziele für nachhaltige Entwicklung

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000167015

Veröffentlicht am 10.01.2024

Externe Links

Originalveröffentlichung
DOI: 10.1109/TMM.2023.3338079

Scopus
Zitationen: 2

Dimensions
Zitationen: 1

Export

Statistiken

Seitenaufrufe: 109
seit 10.01.2024

Downloads: 58
seit 18.01.2024

Repository KITopen

Self-Supervised Generative-Contrastive Learning of Multi-Modal Euclidean Input for 3D Shape Latent Representations: A Dynamic Switching Approach

Abstract: