The Web has become the main publication medium worldwide, covering almost every facet of human activity. In many cases, the Web is the only medium where such information is recorded. However, the Web is an ephemeral medium whose contents are constantly changing and new information is rapidly replacing old information, and hence the critical importance of establishing web archives to capture at least partially the information that is deemed important in the long term. In this work, we address search and access strategies of web archives, and outline our approach for carrying out effective search and retrieval of archived web contents.In a typical web archive, the contents are highly unstructured and interlinked within a temporal context. Over time, such archived web contents can present an unprecedented opportunity for information and knowledge discovery in linking and fusing the appropriate information spread over several contextual domains, including the temporal domain. We present in this paper a number of methods for searching web archives which will significantly contribute towards realizing this opportunity. We also address different presentation strategies of the contents of interest, and extend information retrieval techniques to include temporal contexts seamlessly into the architecture.
Sangchul Song, Joseph JaJa, "Search and Access Strategies for Web Archives" in Proc. IS&T Archiving 2009, 2009, pp 73 - 78, https://doi.org/10.2352/issn.2168-3204.2009.6.1.art00015