Back to articles
Volume: 28 | Article ID: art00006
Image
Cuckoos among Your Data: A Quality Control Method to Retrieve Mislabeled Writer Identities from Handwriting Datasets
  DOI :  10.2352/ISSN.2470-1173.2016.17.DRR-056  Published OnlineFebruary 2016
Abstract

Motivation: Handwriting datasets may contain specimens assigned to the wrong writer. A little discussed problem, such misclassifications, "cuckoos", can bias recognition, retrieval, identification, and other expertise systems, with serious consequences in biometric and forensic applications. Indeed, misclassification research has been purported as the most important topic in pattern recognition. Objective: We describe the design of a generic semi-automatic method for detecting possible misclassifications and illustrate it by way of an exemplary classification criteria (writer identity), measurement feature (contour orientation), and document distance metrics combination. Method: The core of the method consists in automated ranking of writer classes by stylistic variability, using the open source software Alphonse, followed by visual inspection of a limited number of top ranking classes, using an interactive handwriting datasets visualization tool, Rex. The method is independent from dataset producers and does not necessitate training. It is the result of empirical and theoretical research, and its performance demonstrated on the Swiss IAM offline handwriting dataset. Findings: We show that to evaluate the performance of a quality control it is necessary to consider the interdependency between system sensitivity and task difficulty. We propose a dataset-independent measure of the scrambling severity of a dataset and its proneness to misclassification. We find that in a broad writer population the variability of the contour orientation approaches a log-normal distribution, increasing the amount of genuine outliers.

Subject Areas :
Views 38
Downloads 5
 articleview.views 38
 articleview.downloads 5
  Cite this article 

Vlad Atanasiu, "Cuckoos among Your Data: A Quality Control Method to Retrieve Mislabeled Writer Identities from Handwriting Datasetsin Proc. IS&T Int’l. Symp. on Electronic Imaging: Document Recognition and Retrieval XXIII,  2016,  https://doi.org/10.2352/ISSN.2470-1173.2016.17.DRR-056

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2016
72010604
Electronic Imaging
2470-1173
Society for Imaging Science and Technology