KIT | KIT-Bibliothek | Impressum | Datenschutz

Named Entity Recognition for digitised archival documents in German

Garay, Nele; Vafaie, Mahsa 1; Sack, Harald 1
1 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT)

Abstract:

This paper presents an experiment that evaluates the effectiveness of two different Named Entity Recognition (NER) tools at extracting entities directly from the output of an Optical Character Recognition (OCR) workflow. The authors initially developed a test dataset comprising both raw and corrected OCR outputs, which were manually annotated with tags for Person, Location, and Organisation. Subsequently, they applied each NER tool to both the raw and corrected OCR outputs, evaluating their performance by comparing the precision, recall, and F1 scores against the manually annotated data.


Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 24.05.2025
Sprache Englisch
Identifikator ISSN: 1613-0073
KITopen-ID: 1000183749
Erschienen in Joint Proceedings of Posters, Demos, Workshops, and Tutorials of the 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW-PDWT 2024) co-located with 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2024)
Veranstaltung 24th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2024), Amsterdam, Niederlande, 26.11.2024 – 28.11.2024
Verlag CEUR-WS
Seiten Art.-Nr.: 183
Serie CEUR workshop proceedings ; 3967
Schlagwörter Named Entity Recognition (NER), Optical Character Recognition (OCR), Digital Cultural Heritage
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page