KIT | KIT-Bibliothek | Impressum | Datenschutz
Open Access Logo
DOI: 10.5445/IR/1000090640
Veröffentlicht am 08.02.2019
DOI: 10.1186/s12859-018-2576-5

iRODS metadata management for a cancer genome analysis workflow

Nieroda, L.; Maas, L.; Thiebes, S.; Lang, U.; Sunyaev, A.; Achter, V.; Peifer, M.

Background: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date.
Results: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information.
Conclusions: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.

Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Zeitschriftenaufsatz
Jahr 2019
Sprache Englisch
Identifikator ISSN: 1471-2105
URN: urn:nbn:de:swb:90-906400
KITopen-ID: 1000090640
Erschienen in BMC bioinformatics
Band 20
Heft 1
Seiten Art. Nr.: 29
Schlagworte Next generation sequencing (NGS), Genome analysis, iRODS, Workflow integration, High performance computing (HPC), Data security, Data consistency, Metadata management
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page