Designing Effective Retrieval Systems for Digital Archives of Historical Documents

Andy White

doi:10.2352/issn.2168-3204.2005.2.1.art00009

This paper uses two high profile digitisation projects to demonstrate the way in which effective retrieval strategies can be designed for digital resources. The main theme is the relationship between the accuracy of the natural language of the database and the effectiveness of the various search functions. It will be argued that successful retrieval strategies can only be based on ASCII text of an exceedingly high standard. To reach this standard requires rigorous proofreading and, as such, would appear to call into question the creation of databases comprising millions of words; the projects cited by the author each contain less than one million. If ICT is not a panacea for converting enormous amounts of original historical documents into easily retrievable digital archives, much smaller digital collections can yield results. The circular, as opposed to more traditional linear, media enables the design of content in a multi-layered fashion. Thus, material can be catalogued and tagged for metadata in an academic way but, with the aid of additional multimedia features, can provide different entries of access for people of varying abilities and interests. In this context, highly accurate retrieval systems using both controlled vocabularies and natural language can greatly aid researchers.