Keying data from digital images is time-consuming and costly and is subject to human error. Archivists are often limited in their content production by their keying budget and by the cost of auditing keyed index data. To increase both quality and production, one alternative is to OCR-process machine-printed documents. Today's OCR technologies are only as good as the bitonal (black and white) documents they process, so a high-quality, high-performance binarizer (a tool to convert color or grayscale images to bitonal ones) is critical to the success of OCR-processing historical records.Discussed are the challenges binarizers face, the methodology used to test a new binarizer, and the results of the new binarizer, compared with a small sampling of other binarization technologies. Not discussed are the proprietary details of the new binarization algorithm.
Donald B. Curtis, "Evaluating Binarization Techniques for Optical Character Recognition" in Proc. IS&T Archiving 2007, 2007, pp 110 - 112, https://doi.org/10.2352/issn.2168-3204.2007.4.1.art00026