KIT | KIT-Bibliothek | Impressum | Datenschutz

Improvements in Handwritten and Printed Text Separation in Historical Archival Documents

Vafaie, Mahsa 1; Waitelonis, Jörg 1; Sack, Harald 1
1 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT)

Abstract (englisch):

The presence of handwritten text and annotations combined with typewritten and machine-printed text in historical archival records make them visually complex, posing challenges for OCR systems in accurately transcribing their content. This paper is an extension of [1], reporting on improvements in the separation of handwritten text from machine-printed text (including typewriters), by the use of FCN-based models trained on datasets created from different data synthesis pipelines. Results show a significant increase of about 20% in the intrinsic evaluation on artificial test sets, and 8% improvement in the extrinsic evaluation on a subsequent OCR task on real archival documents


Download
Originalveröffentlichung
DOI: 10.2352/issn.2168-3204.2023.20.1.7
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Proceedingsbeitrag
Publikationsmonat/-jahr 06.2023
Sprache Englisch
Identifikator KITopen-ID: 1000169282
Erschienen in Archiving Conference
Veranstaltung Archiving Conference (2023), Oslo, Norwegen, 19.06.2023 – 23.06.2023
Seiten 36–41
Serie 20
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page