Back to articles
Volume: 13 | Article ID: art00007
Scalable Processing and Search in Package-based Repositories
  DOI :  10.2352/issn.2168-3204.2016.1.0.24  Published OnlineApril 2016

Subject of this paper is the architecture of the prototype implementation developed in the E-ARK project. It is specifically designed to support scalable and efficient data transformation, information extraction from archival information packages, and full-text search in the repository. As a continuation of previous work related to the use of Hadoop to process large data volumes, it presents a combined approach of using a distributed task queue for parallel processing together with Hadoop and HBase to allow computing intensive and long-running tasks being applied during ingest as well as the full-text indexing of very large document collections.

Subject Areas :
Views 26
Downloads 0
 articleview.views 26
 articleview.downloads 0
  Cite this article 

Sven Schlarb, Rainer Schmidt, Mihai Bartha, Roman Karl, "Scalable Processing and Search in Package-based Repositoriesin Proc. IS&T Archiving 2016,  2016,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2016
Archiving Conference
Society for Imaging Science and Technology