Back to articles
Articles
Volume: 16 | Article ID: art00017
Image
Creating artificial ground-truth data for document image page segmentation
  DOI :  10.2352/issn.2168-3204.2019.1.0.17  Published OnlineMay 2019
Abstract

We propose a framework that can be used to create artificial ground-truth data for document images. The resulting data can then be used to train machine-learning systems to perform page segmentation tasks. The main focus of this system is on images of historical documents. The framework creates document images with headlines of differing sizes, multiple column layouts, pictures and decorative elements. To improve the resemblance with historical document images, a set of backgrounds is created manually by extracting background textures from real historical documents. The fading and curling typical of old manuscripts are also simulated. Experiments with a neural network – trained on data generated using the proposed framework and applied to realworld images – show promising results with robust segmentation of text and non-text image areas.

Subject Areas :
Views 5
Downloads 2
 articleview.views 5
 articleview.downloads 2
  Cite this article 

Oliver Paetzel, Hauke Bluhm, "Creating artificial ground-truth data for document image page segmentationin Proc. IS&T Archiving 2019,  2019,  pp 76 - 80,  https://doi.org/10.2352/issn.2168-3204.2019.1.0.17

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2019
72010361
Archiving Conference
archiving
2161-8798
Society for Imaging Science and Technology