KIT | KIT-Bibliothek | Impressum | Datenschutz

SELFIES and the future of molecular string representations

Krenn, Mario ; Ai, Qianxiang; Barthel, Senja; Carson, Nessa; Frei, Angelo; Frey, Nathan C.; Friederich, Pascal ORCID iD icon 1,2; Gaudin, Théophile; Gayle, Alberto Alexander; Jablonka, Kevin Maik; Lameiro, Rafael F.; Lemm, Dominik; Lo, Alston; Moosavi, Seyed Mohamad; Nápoles-Duarte, José Manuel; Nigam, AkshatKumar; Pollice, Robert; Rajan, Kohulan; Schatzschneider, Ulrich; ... mehr

Abstract:

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. ... mehr


Volltext §
DOI: 10.5445/IR/1000152111
Veröffentlicht am 28.10.2022
Originalveröffentlichung
DOI: 10.48550/arXiv.2204.00056
Dimensions
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Nanotechnologie (INT)
Institut für Theoretische Informatik (ITI)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2022
Sprache Englisch
Identifikator KITopen-ID: 1000152111
Auflage 34 S.
Vorab online veröffentlicht am 31.03.2022
Nachgewiesen in Dimensions
arXiv
Relationen in KITopen
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page