This paper addresses the problem of registering multimodal images of scene depth variation. The existing methods typically build matches of keypoints with descriptors and then apply consensus/consistency check to rule out incorrect matches. However, the consistency check often fails
to work when there are a large number of wrong matches. Given a set of initial matches built with descriptors, we seek to search the best or correctly matched keypoints. To this end, this work employs the global information over entire images to assess the quality of keypoint matches. Since
the image content has depth variation, projection transformations are needed to account for the misalignment and hence quadruples of keypoint matches are considered. In order to search the correctly matched keypoints, an iterative process is used that considers all preserved quadruples passing
the spatial coherence constraint. Extensive experimental results on various image data show that the proposed method outperforms the state-of-the-art methods.