KIT | KIT-Bibliothek | Impressum | Datenschutz

Hyper-Dimensional Fingerprints as Molecular Representations

Teufel, Jonas ORCID iD icon 1; Torresi, Luca; Eberhard, André; Friederich, Pascal ORCID iD icon 1
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

Computational molecular representations underpin virtual screening, property prediction, and materials discovery. Conventional fingerprints are efficient and deterministic but lose structural information through hash-based compression, particularly at low dimensionalities. Learned representations from graph neural networks recover this expressiveness but require task-specific training and substantial computational resources. Here we introduce hyperdimensional fingerprints (HDF), which replace the learned transformations of message-passing neural networks with algebraic operations on high-dimensional vectors, producing deterministic molecular representations without any training. Across diverse property prediction benchmarks, HDF outperforms conventional fingerprints in the majority of tasks while exhibiting greater consistency across datasets and models. Crucially, HDF embeddings preserve molecular similarity faithfully: at 32 dimensions, distances in HDF space achieve a 0.9 Pearson correlation with graph edit distance, compared to 0.55 for Morgan fingerprints at equivalent size. This structural fidelity persists at low dimensions where hash-based methods degrade, allowing simple nearest-neighbor regression to remain predictive with as few as 64 components. ... mehr


Volltext §
DOI: 10.5445/IR/1000193403
Veröffentlicht am 20.05.2026
Originalveröffentlichung
DOI: 10.48550/arXiv.2604.27810
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2026
Sprache Englisch
Identifikator KITopen-ID: 1000193403
HGF-Programm 43.31.01 (POF IV, LK 01) Multifunctionality Molecular Design & Material Architecture
Verlag arxiv
Umfang 20 S.
Vorab online veröffentlicht am 30.04.2026
Schlagwörter Machine Learning (cs.LG), Molecular Property Prediction, Molecular Fingerprints, Hyperdimensional, Computing
Nachgewiesen in arXiv
OpenAlex
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page