Back to articles
Volume: 28 | Article ID: art00002
Training a calligraphy style classifier on a non-representative training set
  DOI :  10.2352/ISSN.2470-1173.2016.17.DRR-052  Published OnlineFebruary 2016

Calligraphy collections are being scanned into document images for preservation and accessibility. The digitization technology is mature and calligraphy character recognition is well underway, but automatic calligraphy style classification is lagging. Special style features are developed to measure style similarity of calligraphy character images of different stroke configurations and GB (or Unicode) labels. Recognizing the five main styles is easiest when a style-labeled sample of the same character (i. e., same GB code) from the same work and scribe is available. Even samples of characters with different GB codes from same work help. Style classification is most difficult when the training data has no comparable characters from the same work. These distinctions are quantified by distance statistics between the underlying feature distributions. Style classification is more accurate when several character samples from the same work are available. In adverse practical scenarios, when labeled versions of unknown works are not available for training the classifier, Borda Count voting and adaptive classification of style-sensitive feature vectors seven-character from the same work raises the ~70% single-sample baseline accuracy to ~90%.

Subject Areas :
Views 50
Downloads 0
 articleview.views 50
 articleview.downloads 0
  Cite this article 

George Nagy, "Training a calligraphy style classifier on a non-representative training setin Proc. IS&T Int’l. Symp. on Electronic Imaging: Document Recognition and Retrieval XXIII,  2016,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2016
Electronic Imaging
Society for Imaging Science and Technology