A continued challenge for preservation is objective data to make informed collection decisions. When considering a shared national print system, this challenge relates to decisions of withdrawal or retention since catalog partners may not have data regarding the condition of others’ volumes. This conundrum led to a national research initiative funded by the Mellon Foundation, “Assessing the Physical Condition of the National Collection.” The project captured and analyzed condition data from 500 “identical” volumes from five American research libraries to explore the following: What is the condition of book collections from 1840–1940? Can condition be predicted by catalog or physical parameters? What assessment tools might indicate a book’s life expectancy? Filling gaps in knowledge about the physicality of our collections is helping identify at-risk collections and explain the cases of dissimilar “same” volumes based on the impact of paper composition. Predictive modeling and assessment tools are also used to improve the understanding of what is typical for specific eras.
While we now have mature, proven guidelines (FADGI) which provide solid recommendations on how to create proper master files, beyond targets and the ability to measure them, the cultural heritage community lacks easily consumable, flexible specifications for conducting actual projects. Moreover, there is a general lack of examples of FADGI-compliant Statements of Work, leading to much re-invention of the wheel and even to library and archival personnel deciding to not use FADGI at all. This puts inexperienced users at a decided disadvantage and creates a formidable barrier to entry for new practitioners who want to use the FADGI guidelines on their projects. As discussed in this paper, a DIO (or Digitization Information Object) is a data model encompassing all technical parameters of a still image digitization project. At its core, the DIO schema is intrinsically tied to FADGI, and enforces FADGI compliance through its use. It provides a common, machine-readable instruction set for digitization-facing software programs. This allows consuming applications to be quickly and precisely configured per-project to specify output image parameters, configure post-processing workflows, verify both working files and huge batches of completed content at scale, and even to provide plain-English text for a project's Statement of Work -- all from the same DIO JSON file.