The usability and accessibility of digitised archival data can be improved using deep learning solutions. In this paper, the authors present their work in developing a named entity recognition (NER) model for digitised archival data, specifically state authority documents. The entities for the model were chosen based on surveying different user groups. In addition to common entities, two new entities were created to identify businesses (FIBC) and archival documents (JON). The NER model was trained by fine-tuning an existing Finnish BERT model. The training data also included modern digitally born texts to achieve good performance with various types of inputs. The finished model performs fairly well with OCR-processed data, achieving an overall F1 score of 0.868, and particularly well with the new entities (F1 scores of 0.89 and 0.97 for JON and FIBC, respectively).
The Smithsonian Institution Digitization Program Office’s Collection Digitization team develops and designs a “three-pronged” workflow approach to mass digitization of museum collections, called the Physical, Imaging, and Virtual Workflows. This approach addresses proper handling of objects, optimizing capture throughputs, and streamlines the processing and delivery of images through automation. The Physical Workflow Design defines the production space and safe movement of objects from storage to the digitization production space; the Imaging Workflow Design defines the technical specifications, file deliverables, and the results of our ‘Item Driven Image Fidelity’ (IDIF) testing; and finally, the Virtual Workflow Design defines the lifecycle of the digital file, from creation to online access, describing the various data processes required for success.
The Hoover Institution Library & Archives (HILA) has implemented Smartsheet, a cloud-based project management tool, to manage tasks and cross-team handoffs for its new mass digitization program. By combining task-specific tools such as Capture One and LIMB Processing with the administrative flexibility of Smartsheet, HILA has succeeded in leveraging commercial project management functionality for cultural heritage purposes, resulting in improvements to our program’s efficiency, flexibility, and reporting capabilities.