iRODS metadata management for a cancer genome analysis workflow

Nieroda, L.; Maas, L.; Thiebes, S.; Lang, U.; Sunyaev, A.; Achter, V.; Peifer, M.

Background: The massive amounts of data from next generation sequencing (NGS) methods pose various challenges with respect to data security, storage and metadata management. While there is a broad range of data analysis pipelines, these challenges remain largely unaddressed to date.
Results: We describe the integration of the open-source metadata management system iRODS (Integrated Rule-Oriented Data System) with a cancer genome analysis pipeline in a high performance computing environment. The system allows for customized metadata attributes as well as fine-grained protection rules and is augmented by a user-friendly front-end for metadata input. This results in a robust, efficient end-to-end workflow under consideration of data security, central storage and unified metadata information.
Conclusions: Integrating iRODS with an NGS data analysis pipeline is a suitable method for addressing the challenges of data security, storage and metadata management in NGS environments.

DOI: 10.5445/IR/1000090640
Veröffentlicht am 08.02.2019
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Zeitschriftenaufsatz
Jahr 2019
Sprache Englisch
Identifikator ISSN: 1471-2105
KITopen-ID: 1000090640
Erschienen in BMC bioinformatics
Band 20
Heft 1
Seiten Art. Nr.: 29
Schlagworte Next generation sequencing (NGS), Genome analysis, iRODS, Workflow integration, High performance computing (HPC), Data security, Data consistency, Metadata management
