This paper aims at introducing the Internet Memory Foundation platform based on its distributed infrastructure and the associated tools and workflows that facilitate data management and preservation actions at large scale. IMF's main concern over the past years has been related to scalability issues in terms of crawling, indexing, preserving and accessing content. To answer these issues, the Foundation developed its own crawler and built a new infrastructure.This paper aims at presenting our infrastructure and crawler and at sharing challenges met while building them as well as the approach taken to solve preservation issues inherent to scalable archives. It will also highlight new horizons arising for web archives in relation to analytics use cases.
Leïla Medjkoune, Stanislav Barton, Florent Carpentier, Julien Masanès, Radu Pop, "Building Scalable Web Archives" in Proc. IS&T Archiving 2014, 2014, pp 138 - 143, https://doi.org/10.2352/issn.2168-3204.2014.11.1.art00030