Determining the similarity of document images is an important first step for several document retrieval tasks, such as document classification, information extraction, and retrieval based on visual similarity. In this paper, we propose a method to describe and compare the content and layout of a document given only an image of the document. A tree structure is used to capture the hierarchical structure of the document. Two documents are then compared using a tree matching strategy.
Burak Bitlis, Xiaojun Feng, Jacob L. Harris, Ilya Pollak, Charles A. Bouman, Mary P. Harper, Jan P. Allebach, "A Hierarchical Document Description and Comparison Method" in Proc. IS&T Archiving 2004, 2004, pp 195 - 198, https://doi.org/10.2352/issn.2168-3204.2004.1.1.art00042