Back to articles
Articles
Volume: 9 | Article ID: art00011
Image
“PlaIR”: A System to Provide Full Access to Digitized Newspaper Archives
  DOI :  10.2352/issn.2168-3204.2012.9.1.art00011  Published OnlineJanuary 2012
Abstract

This paper presents a platform dedicated to the analysis and the online consultation of historical newspaper archives. This platform has been designed to provide a user experience as intuitive as possible by using mature open source tools. All the features are implemented thanks to the Spring framework. To meet this goal, we created a system to display tiled high-resolution images operating without a plug-in but based on an open source solution called IIPImage. The platform also allows for full-text searches thanks to the Java search library Apache Lucene and displays the results in the form of newspaper articles. In addition, we established collaborative features to provide the users with the ability to correct the content automatically generated by our document processing workflow and accessed through the browsing platform. The system is able to store all the corrections of the users, by using the couple Hibernate/MySQL. The aim is to enable continuous improvement of both the content quality and the search accuracy, by exploiting the ability of the users to recognize significant errors, in order to enhance the digital objects representing the newspaper issues.The proposed system is designed to generate metadata describing the physical layout, but also the logical structure of newspaper documents. Our article segmentation analyses a newspaper issue and recognizes articles, even if they straddle more than one page or if they spread in a complex structure. The workflow can also consider as input data, the results of optical character recognition (OCR) engines in order to provide a textual indexation of the segmented articles.By using this system, we want to create a true and representative digital object using standard formats (i.e. METS / ALTO) and containing the logical description of the content, making easier reading and understanding by the users.

Subject Areas :
Views 13
Downloads 0
 articleview.views 13
 articleview.downloads 0
  Cite this article 

Thomas Palfray, Stéphane Nicolas, Thierry Paquet, Pierrick Tranouez, "“PlaIR”: A System to Provide Full Access to Digitized Newspaper Archivesin Proc. IS&T Archiving 2012,  2012,  pp 48 - 53,  https://doi.org/10.2352/issn.2168-3204.2012.9.1.art00011

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2012
72010361
Archiving Conference
archiving
2161-8798
Society of Imaging Science and Technology
7003 Kilworth Lane, Springfield, VA 22151, USA