The rapid growth of digital libraries (DLs) worldwide poses many challenges for document image analysis (DIA) research and development. DLs promise to offer more people access to larger document collections, and at far greater speed, than physical libraries can. But DLs also tend, for many reasons, to serve poorly documents which, although readily legible to people, are not accurately digitally encoded. Originally printed and handwritten documents, for example, in their original physical (undigitized) form of ink-on-paper are widely preferred, over electronic displays, for reading and other uses, whereas in the form of document images accessed through DLs they lose many of these advantages while of course lacking advantages of ‘born digital’ documents. This talk explores these issues and illustrates them with case studies arising from the author's experience as a DIA researcher in collaboration with several DL projects in the US. The pace and scale of commercial document-scanning projects has been accelerating over the last three years. Difficult open DIA technical problems in DL applications are identified in the contrasting advantages of paper and digital displays, at every stage of capture, early processing, recognition, analysis, presentation, & retrieval, and in personal and interactive applications. Discussions at Int'l Workshop on Document Image Analysis for Libraries (DIAL 2004), recently organized by Prof. Venu Govindaraju and the author, are summarized.
Henry S. Baird, "Digital Libraries and Document Image Analysis" in Proc. IS&T Archiving 2004, 2004, pp 286 - 288, https://doi.org/10.2352/issn.2168-3204.2004.1.1.art00060