Back to articles
Article
Volume: 20 | Article ID: 7
Image
Improvements in Handwritten and Printed Text Separation in Historical Archival Documents
  DOI :  10.2352/issn.2168-3204.2023.20.1.7  Published OnlineJune 2023
Abstract
Abstract

The presence of handwritten text and annotations combined with typewritten and machine-printed text in historical archival records make them visually complex, posing challenges for OCR systems in accurately transcribing their content. This paper is an extension of [1], reporting on improvements in the separation of handwritten text from machine-printed text (including typewriters), by the use of FCN-based models trained on datasets created from different data synthesis pipelines. Results show a significant increase of about 20% in the intrinsic evaluation on artificial test sets, and 8% improvement in the extrinsic evaluation on a subsequent OCR task on real archival documents.

Subject Areas :
Views 65
Downloads 20
 articleview.views 65
 articleview.downloads 20
  Cite this article 

Mahsa Vafaie, Jörg Waitelonis, Harald Sack, "Improvements in Handwritten and Printed Text Separation in Historical Archival Documentsin Archiving Conference,  2023,  pp 36 - 41,  https://doi.org/10.2352/issn.2168-3204.2023.20.1.7

 Copy citation
  Copyright statement 
Copyright ©2023 Society for Imaging Science and Technology 2023
archiving
Archiving Conference
2161-8798
2161-8798
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA