PIVAJ is a platform for archived digitized newspaper emphasizing articles: extracting them from digitized documents by automated page layout analysis, OCRing them, indexing their text transcription to allow users to search for content. Crowdsourcing is used to improve the quality of the indexing, by correcting the transcription and by tagging articles with keywords. The platform has been used to give Web access to 550 000 articles generated from a digitized local newspaper. Current developments include further improvements to its OCR as well as graphical interfaces for the management of the platform.
Pierrick Tranouez, Stéphane Nicolas, Julien Lerouge, Thierry Paquet, "PIVAJ: an article-centered platform for digitized newspapers" in Proc. IS&T Archiving 2015, 2015, pp 40 - 43, https://doi.org/10.2352/issn.2168-3204.2015.12.1.art00010