KIT | KIT-Bibliothek | Impressum | Datenschutz

SELFIES and the future of molecular string representations

Krenn, Mario ; Ai, Qianxiang; Barthel, Senja; Carson, Nessa; Frei, Angelo; Frey, Nathan C.; Friederich, Pascal ORCID iD icon 1,2; Gaudin, Théophile; Gayle, Alberto Alexander; Jablonka, Kevin Maik; Lameiro, Rafael F.; Lemm, Dominik; Lo, Alston; Moosavi, Seyed Mohamad; Nápoles-Duarte, José Manuel; Nigam, AkshatKumar; Pollice, Robert; Rajan, Kohulan; Schatzschneider, Ulrich; ... mehr

Abstract:

Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000152067
Veröffentlicht am 28.10.2022
Originalveröffentlichung
DOI: 10.1016/j.patter.2022.100588
Scopus
Zitationen: 90
Dimensions
Zitationen: 113
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Nanotechnologie (INT)
Institut für Theoretische Informatik (ITI)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2022
Sprache Englisch
Identifikator ISSN: 2666-3899
KITopen-ID: 1000152067
HGF-Programm 43.31.01 (POF IV, LK 01) Multifunctionality Molecular Design & Material Architecture
Erschienen in Patterns
Verlag Elsevier
Band 3
Heft 10
Seiten Art.-Nr.: 100588
Vorab online veröffentlicht am 14.10.2022
Nachgewiesen in Scopus
Dimensions
Relationen in KITopen
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page