Back to articles
Articles
Volume: 29 | Article ID: art00026
Image
Highlighted Document Image Classification
  DOI :  10.2352/issn.2169-2629.2021.29.154  Published OnlineNovember 2021
Abstract

There are many existing document image classification researches, but most of them are not designed for use in constrained computer resources, like printers, or focused on documents with highlighter pen marks. To enable printers to better discriminate highlighted documents, we designed a set of features in CIE Lch(a* b*) space to use along with the support vector machine. The features include two gamut-based features and six low-level color features. By first identifying the highlight pixels, and then computing the distance from the highlight pixels to the boundary of the printer gamut, the gamut-based features can be obtained. The low-level color features are built upon the color distribution information of the image blocks. The best feature subset of the existing and new features is constructed by sequential forward floating selection (SFFS) feature selection. Leave-one-out cross-validation is performed on a dataset with 400 document images to evaluate the effectiveness of the classification model. The cross-validation results indicate significant improvements over the baseline highlighted document classification model.

Subject Areas :
Views 10
Downloads 4
 articleview.views 10
 articleview.downloads 4
  Cite this article 

Yafei Mao, Yufang Sun, Peter Bauer, Todd Harris, Mark Shaw, Lixia Li, Jan Allebach, "Highlighted Document Image Classificationin Proc. IS&T 29th Color and Imaging Conf.,  2021,  pp 154 - 159,  https://doi.org/10.2352/issn.2169-2629.2021.29.154

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2021
72010350
Color and Imaging Conference
color imaging conf
2166-9635
Society for Imaging Science and Technology
7003 Kilworth Lane, Springfield, VA 22151 USA