<!DOCTYPE article PUBLIC '-//NLM//DTD Journal Publishing DTD v2.1 20050630//EN' 'http://uploads.ingentaconnect.com/docs/dtd/ingenta-journalpublishing.dtd'>
<article article-type="research-article">
  <front>
    <journal-meta>
      <journal-id journal-id-type="aggregator">72010361</journal-id>
      <journal-title>Archiving Conference</journal-title>
      <abbrev-journal-title>archiving</abbrev-journal-title>
      <issn pub-type="ppub">2161-8798</issn><issn pub-type="epub"></issn>
      <publisher>
        <publisher-name>Society for Imaging Science and Technology</publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.2352/issn.2168-3204.2016.1.0.24</article-id>
      <article-id pub-id-type="sici">2161-8798(20160419)2016:1L.24;1-</article-id>
      <article-id pub-id-type="publisher-id">s7.phd</article-id>
      <article-id pub-id-type="other">/ist/ac/2016/00002016/00000001/art00007</article-id>
      <article-categories>
        <subj-group>
          <subject>Articles</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Scalable Processing and Search in Package-based Repositories</article-title>
      </title-group>
      <contrib-group>
        <contrib>
          <name>
            <surname>Schlarb</surname>
            <given-names>Sven</given-names>
          </name>
        </contrib>
        <contrib>
          <name>
            <surname>Schmidt</surname>
            <given-names>Rainer</given-names>
          </name>
        </contrib>
        <contrib>
          <name>
            <surname>Bartha</surname>
            <given-names>Mihai</given-names>
          </name>
        </contrib>
        <contrib>
          <name>
            <surname>Karl</surname>
            <given-names>Roman</given-names>
          </name>
        </contrib>
      </contrib-group>
      <pub-date>
        <day>19</day>
        <month>04</month>
        <year>2016</year>
      </pub-date>
      <volume>2016</volume>
      <issue>1</issue>
      <fpage>24</fpage>
      <lpage>27</lpage>
      <permissions>
        <copyright-year>2016</copyright-year>
      </permissions>
      <abstract>
        <p>Subject of this paper is the architecture of the prototype implementation developed in the E-ARK project. It is specifically designed to support scalable and efficient data transformation, information extraction from archival information packages, and full-text search in the repository.
 As a continuation of previous work related to the use of Hadoop to process large data volumes, it presents a combined approach of using a distributed task queue for parallel processing together with Hadoop and HBase to allow computing intensive and long-running tasks being applied during ingest
 as well as the full-text indexing of very large document collections.</p>
      </abstract>
    </article-meta>
  </front>
</article>
