KIT | KIT-Bibliothek | Impressum | Datenschutz

Handwritten and printed text separation in historical documents

Prikhodina, Anastasia

Abstract (englisch):

Historical documents present many challenges for Optical Character Recognition Systems
(OCR), especially documents of poor quality containing handwritten annotations,
stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed
or handwritten texts, printed and handwritten parts have to be separated before using
the respective recognition system. This thesis addresses the problem of segmentation of
handwritings and printings in historical Latin text documents. To alleviate the problem
of lack of data containing handwritten and machine-printed components located on the
same page or even overlapping each other as well as their pixel-wise annotations, the data
synthesis method proposed in [12] was applied and new datasets were generated. The
newly created images and their pixel-level labels were used to train Fully Convolutional
Model (FCN) introduced in [5]. The newly trained model has shown better results in the
separation of machine-printed and handwritten text in historical documents.


Volltext §
DOI: 10.5445/IR/1000141960
Veröffentlicht am 19.01.2022
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Hochschulschrift
Publikationsjahr 2021
Sprache Englisch
Identifikator KITopen-ID: 1000141960
Verlag Karlsruher Institut für Technologie (KIT)
Umfang 57 S.
Art der Arbeit Abschlussarbeit - Bachelor
Prüfungsdaten 14.10.2021
Referent/Betreuer Sack, Harald
Zöllner, J. Marius
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page