KIT | KIT-Bibliothek | Impressum | Datenschutz

(Semi-) Automatic Review Process for Common Compound Characterization Data in Organic Synthesis

Huang, Yu-Chieh; Tremouilhac, Pierre 1; Kuhn, Stefan; Huang, Pei-Chi; Lin, Chia-Lin; Schlörer, Nils; Taubert, Oskar ORCID iD icon 2; Götz, Markus ORCID iD icon 2; Jung, Nicole 3; Bräse, Stefan 3
1 Institut für Biologische und Chemische Systeme (IBCS), Karlsruher Institut für Technologie (KIT)
2 Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)
3 Institut für Organische Chemie (IOC), Karlsruher Institut für Technologie (KIT)

Abstract:

A method for data review in chemical sciences with a focus on data for the characterization of synthetic molecules is described. As current procedures for data curation in chemistry rely almost exclusively on manual checking or peer reviewing, a (semi-)automatic procedure for the evaluation of data assigned to molecular structures is proposed and demonstrated. The information usually required for the identification of isolated compounds is used to clarify whether the data is complete with respect to the available data types and metadata, if it is consistent with the proposed structure and if it is plausible in comparison to simulated data. Spectra prediction and automatic signal comparison are applied to NMR evaluation, mass spectrometry data are evaluated by signal extraction, and machine learning is used for IR analysis. The proposed protocol shows how an integration of different tools for data analysis can help to overcome the challenges of the currently purely manual reviewing and curation efforts for data in synthetic chemistry.


Volltext §
DOI: 10.5445/IR/1000172676
Veröffentlicht am 22.07.2024
Originalveröffentlichung
DOI: 10.26434/chemrxiv-2024-1r9tb
Dimensions
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Biologische und Chemische Systeme (IBCS)
Institut für Organische Chemie (IOC)
Scientific Computing Center (SCC)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2024
Sprache Englisch
Identifikator KITopen-ID: 1000172676
Weitere HGF-Programme 46.21.04 (POF IV, LK 01) HAICU
Verlag ChemRxiv
Vorab online veröffentlicht am 28.02.2024
Schlagwörter data curation, repositories, electronic lab notebooks, chemistry data, analytics
Nachgewiesen in Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page