Calligraphy collections are being scanned into document images for preservation and accessibility. The digitization technology is mature and calligraphy character recognition is well underway, but automatic calligraphy style classification is lagging. Special style features are developed to measure style similarity of calligraphy character images of different stroke configurations and GB (or Unicode) labels. Recognizing the five main styles is easiest when a style-labeled sample of the same character (i. e., same GB code) from the same work and scribe is available. Even samples of characters with different GB codes from same work help. Style classification is most difficult when the training data has no comparable characters from the same work. These distinctions are quantified by distance statistics between the underlying feature distributions. Style classification is more accurate when several character samples from the same work are available. In adverse practical scenarios, when labeled versions of unknown works are not available for training the classifier, Borda Count voting and adaptive classification of style-sensitive feature vectors seven-character from the same work raises the ~70% single-sample baseline accuracy to ~90%.
We propose a human-in-the-loop scheme for optical music recognition. Starting from the results of our recognition engine, we pose the problem as one of constrained optimization, in which the human can specify various pixel labels, while our recognition engine seeks an optimal explanation subject to the humansupplied constraints. In this way we enable an interactive approach with a uniform communication channel from human to machine where both iterate their roles until the desired end is achieved. Pixel constraints may be added to various stages, including staff finding, system identification, and measure recognition. Results on a test show significant speed up when compared to purely human-driven correction.
Medical images in biomedical documents tend to be complex by nature and often contain several regions that are annotated using arrows. Arrowhead detection is a critical precursor to regionof-interest (ROI) labeling and image content analysis. To detect arrowheads, images are first binarized using fuzzy binarization technique to segment a set of candidates based on connected component principle. To select arrow candidates, we use convexity defect-based filtering, which is followed by template matching via dynamic programming. The similarity score via dynamic time warping (DTW) confirms the presence of arrows in the image. Our test on biomedical images from imageCLEF 2010 collection shows the interest of the technique.
Recent advances in writer identification push the limits by using increasingly complex methods relying on sophisticated preprocessing, or the combination of already complex descriptors. In this paper, we pursue a simpler and faster approach to writer identification, introducing novel descriptors computed from the geometrical arrangement of interest points at different scales. They capture orientation distributions and geometrical relationships of script parts such as strokes, junctions, endings, and loops. Thus, we avoid a fixed set of character appearances as in standard codebook-based methods. The proposed descriptors significantly cut down processing time compared to existing methods, are simple and efficient, and can be applied out-of-the-box to an unseen dataset. Evaluations on widely-used datasets show their potential when applied by themselves, and in combination with other descriptors. Limitations of our method relate to the amount of data needed to obtain reliable models.
Motivation: Handwriting datasets may contain specimens assigned to the wrong writer. A little discussed problem, such misclassifications, "cuckoos", can bias recognition, retrieval, identification, and other expertise systems, with serious consequences in biometric and forensic applications. Indeed, misclassification research has been purported as the most important topic in pattern recognition. Objective: We describe the design of a generic semi-automatic method for detecting possible misclassifications and illustrate it by way of an exemplary classification criteria (writer identity), measurement feature (contour orientation), and document distance metrics combination. Method: The core of the method consists in automated ranking of writer classes by stylistic variability, using the open source software Alphonse, followed by visual inspection of a limited number of top ranking classes, using an interactive handwriting datasets visualization tool, Rex. The method is independent from dataset producers and does not necessitate training. It is the result of empirical and theoretical research, and its performance demonstrated on the Swiss IAM offline handwriting dataset. Findings: We show that to evaluate the performance of a quality control it is necessary to consider the interdependency between system sensitivity and task difficulty. We propose a dataset-independent measure of the scrambling severity of a dataset and its proneness to misclassification. We find that in a broad writer population the variability of the contour orientation approaches a log-normal distribution, increasing the amount of genuine outliers.
Extracting strokes from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, overlapping ascenders and descenders and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. We introduce Intelligent Pen for piece-wise optimal stroke extraction. Extracted strokes are stitched together to provide a complete trace of the handwriting. Intelligent Pen formulates stroke extraction as a set of piece-wise optimal paths, extracted and assembled in cost order. As such, Intelligent Pen is robust to noise, gaps, faint handwriting and even competing lines and strokes. Intelligent Pen traces compare closely with the shape as well as the order in which the handwriting was written. A quantitative comparison with an ICDAR handwritten stroke data set shows Intelligent Pen traces to be within 2.58 pixels (mean difference) of the manually created strokes.
This article presents a system dedicated to automatic language identification of text regions in heterogeneous and complex documents. This system is able to process documents with mixed printed and handwritten text and various layouts. To handle such a problem, the authors propose a system that performs the following sub-tasks: writing type identification (printed/handwritten), script identification and language identification. The methods for writing type recognition and script discrimination are based on analysis of the connected components, while the language identification approach relies on a statistical text analysis, which requires a recognition engine. The authors evaluate the system on a new public dataset and present detailed results on the three tasks. Their system outperforms the Google plug-in evaluated on ground-truth transcriptions of the same dataset. c 2016 Society for Imaging Science and Technology.
Deep architectures based on convolutional neural networks have obtained state-of-the-art results for several recognition tasks. These architectures rely on a cascade of convolutional layers and activation functions. Beyond the set-up of the number of layers and the number of neurons in each layer, the choice of activation functions, training optimization algorithm and regularization procedure are of great importance. In this work we start from a deep convolutional architecture and we describe the effect of recent activation functions, optimization algorithms and regularization procedures when applied to the recognition of handwritten digits from the MNIST dataset. The network achieves a 0.38 % error rate, matching and slightly improving the best known performance of a single model trained without data augmentation at the time the experiments were performed.
Detecting overlapping text from map images is a challenging problem. Previous algorithms generally assume specific cartographic styles (e. g., road shapes and text format) and are difficult to adjust for handling different map types. In this paper, we build on our previous text recognition work, Strabo, to develop an algorithm for detecting overlapping characters from non-text symbols. We call this algorithm Overlapping Text Detection (OTD). OTD uses the recognition results and locations of detected text labels (from Strabo) to detect potential areas that contain overlapping text. Next, OTD classifies these areas as either text or non-text regions based on their shape descriptions (including the ratio of number of foreground pixels to area size, number of connected components, and number of holes). The average precision and recall of OTD in classifying text and non-text regions were 77% and 86%, respectively. We show that OTD improved the precision and recall of text detection in Strabo by 19% and 41%, respectively, and produced higher accuracy compared to a state-of- the-art text/graphic separation algorithm.