This paper presents and interprets data on digitization error gathered from four 1,000 volume random samples that represent the full range of source volumes digitized by Google and the Internet Archive over a six year period and deposited in the HathiTrust Digital Library. The paper summarizes the research method for the project and then presents summary findings on the distribution of page-image error. The findings suggest that the imperfection of digital surrogates is a transparent and nearly ubiquitous attribute of large-scale digitization and one that introduces new complexity in preservation repositories. The paper concludes with suggestions for further research.
Paul Conway, "Page-Image Error in Large-Scale Digitization" in Proc. IS&T Archiving 2013, 2013, pp 36 - 42, https://doi.org/10.2352/issn.2168-3204.2013.10.1.art00009