The Smithsonian Institution Archives (SIA) and the Rockefeller Archive Center (RAC) conducted a three-year pilot that explored preservation challenges with email collections. This paper reviews the acquisition model and workflow used based on the OAIS Reference Model. Rather than focusing on individual messages, the Collaborative Electronic Records Project (CERP) settled on preserving an account as a whole, maintaining the structure and relationships within a collection as well as simplifying metadata management. This paper also reviews some of the challenges with the email collections, including lack of organization and inclusion of non-record/sensitive material. Both archives also addressed the importance of sound recordkeeping practices and retention schedules and issued various guidance documents for depositors.CERP also collaborated with another research team (the EMail Collaborative Initiative (EMCAP)) to develop an XML schema capable of encompassing a complete email account and its content. The E-Mail Account XML schema defines a standard XML structure for preserving an email account along with its internal organization, its messages and attachments, and the interrelationships of the messages without sacrificing granular email message data. This paper describes the schema, its unique characteristics, and its value to the archival and digital preservation communities in the context of, and comparison to, other efforts to digitally preserve email.The schema structure positions preserved email accounts for multiple levels of searching strategies including: individual messages, account-wide, and cross-account search and retrieval. This helps to expose social networks and message interrelationships present in, and across, accounts.The E-Mail Account schema has made possible the preservation of large bodies of related e-mail in a single XML file, as demonstrated in the recent EMCAP and CERP projects. Unlike other work in the area of e-mail preservation, this XML schema is distinct in: 1) its account-based paradigm; 2) the granularity of data captured; 3) its alignment with the email message standard RFC 2822; 4) the support of a single XML file representation of the account; and 5) its incorporation into two separately developed e-mail preservation software applications.
Riccardo Ferrante, Lynda Schmitz Fuhrig, "Digital Preservation: Using the Email Account XML Schema" in Proc. IS&T Archiving 2009, 2009, pp 41 - 46,