The presence of handwritten text and annotations combined with typewritten and machine-printed text in historical archival records make them visually complex, posing challenges for OCR systems in accurately transcribing their content. This paper is an extension of [1], reporting on improvements in the separation of handwritten text from machine-printed text (including typewriters), by the use of FCN-based models trained on datasets created from different data synthesis pipelines. Results show a significant increase of about 20% in the intrinsic evaluation on artificial test sets, and 8% improvement in the extrinsic evaluation on a subsequent OCR task on real archival documents.
Mahsa Vafaie, Jörg Waitelonis, Harald Sack, "Improvements in Handwritten and Printed Text Separation in Historical Archival Documents" in Archiving Conference, 2023, pp 36 - 41, https://doi.org/10.2352/issn.2168-3204.2023.20.1.7