KIT | KIT-Bibliothek | Impressum | Datenschutz

Text Corpus in Collaboration - A balance between customized and standardized approach

Jha, Vandana ORCID iD icon 1; Tögel, Philipp ORCID iD icon 1; Tonne, Danah ORCID iD icon 1; Elwert, Frederik; Gebhard, Henning; Fedorov, Makar ; Dipper, Stefanie; Henny-Krahmer, Ulrike [Hrsg.]; Theise, Antje [Hrsg.]; Helling, Patrick [Hrsg.]
1 Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)

Abstract:

The Text Encoding Initiative (TEI) compliant Extensible Markup Language (XML) is the predominant standard for creating, encoding and managing digital textual data in the field of Digital Humanities (DH). The Collaborative Research Center (CRC) 1475, "Metaphors of Religion," leverages this robust and flexible standard to develop a shared infrastructure that facilitates metaphor analysis across diverse religious traditions, languages and time periods ranging from 2,000 BCE to the present day. While DH projects often begin with an emphasis on data reusability, selecting widely accepted standards and licenses accordingly, many research contexts instead start from a scholarly interest, requiring the integration of heterogeneous sources not originally intended for reuse. The information infrastructure subproject (INF) utilizes TEI-XML to harmonize and integrate texts from various subprojects, accommodating variations in language, editorial processes, and file formats within a flexible yet standardized schema. This method establishes a core framework that ensures structural and semantic consistency while allowing extensions tailored to specific subproject needs. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000185245
Veröffentlicht am 01.10.2025
Originalveröffentlichung
DOI: 10.5281/zenodo.17178219
Cover der Publikation
Zugehörige Institution(en) am KIT Scientific Computing Center (SCC)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 22.09.2025
Sprache Deutsch
Identifikator KITopen-ID: 1000185245
HGF-Programm 46.21.02 (POF IV, LK 01) Cross-Domain ATMLs and Research Groups
Erschienen in FORGE 2025 - Forschungsdaten in den Geisteswissenschaften: Daten neu denken. Konferenzabstracts, Editor - Antje Theise, Patrick Helling and Ulrike Henny-Krahmer
Veranstaltung Forschungsdaten in den Geisteswissenschaften: Daten neu denken (FORGE 2025), Rostock, Deutschland, 24.09.2025 – 26.09.2025
Verlag Zenodo
Seiten 101-107
Projektinformation SFB 1475, 441126958 (DFG, DFG KOORD, SFB 1475/INF)
Externe Relationen Siehe auch
Schlagwörter Infrastructure, Heterogeneous, Standardization, Text Encoding Initiative, FORGE2025
Nachgewiesen in OpenAlex
Globale Ziele für nachhaltige Entwicklung Ziel 9 – Industrie, Innovation und InfrastrukturZiel 17 – Partnerschaften zur Erreichung der Ziele
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page