Back to articles
Volume: 29 | Article ID: art00015
Document Image Classification on the Basis of Layout Information
  DOI :  10.2352/ISSN.2470-1173.2017.2.VIPC-412  Published OnlineJanuary 2017

In this paper, we propose a document image classification framework based on layout information. Our method does not use OCR; hence, it is completely language independent. Still we are able to exploit text data by extracting text regions with a novel MSER-based approach. Our MSER formulation provides great robustness against text distortions in comparison to the existing one. We introduce two types of novel image descriptors supplemented with Fisher vectors, based on Bernoulli mixture model. Classifiers, based on aforementioned descriptors, are assembled into meta-classification system that is able to classify document in complex cases when individual classifier accuracy is poor. Our meta-classification system demonstrates low processing time comparable to a single classifier. We show that our method outperforms the existing ones by the means of classification accuracy for a wide range of documents of both well-known and machine-generated document datasets.

Subject Areas :
Views 44
Downloads 4
 articleview.views 44
 articleview.downloads 4
  Cite this article 

Sergey Zavalishin, Andrey Bout, Ilya Kurilin, Michail Rychagov, "Document Image Classification on the Basis of Layout Informationin Proc. IS&T Int’l. Symp. on Electronic Imaging: Visual Information Processing and Communication VIII,  2017,  pp 78 - 86,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2017
Electronic Imaging
Society for Imaging Science and Technology