PhotoDNA is a widely used method for generating robust image hashes. It is widely used today for the detection of CSAM. This results in large numbers of images that need to be compared. This is done over a Euclidean distance, which requires relatively expensive computations. We present an approach that allows the comparison of these images to be performed significantly more efficiently. We also show that both robustness and resistance to false positives are not compromised. Our approach is based on converting the PhotoDNA hash from 144 bytes to 300 bits, which can be compared using Hamming distance. An advantage is that the existing hashes can be converted directly, so no new calculation of hashes from reference images is necessary.
Both robust and cryptographic hash methods have advantages and disadvantages. It would be ideal if robustness and cryptographic confidentiality could be combined. The problem here is that the concept of similarity of robust hashes cannot be applied to cryptographic hashes. Therefore, methods must be developed to reliably intercept the degrees of freedom of robust hashes before they are included in a cryptographic hash, but without losing their robustness. To achieve this, we need to predict the bits of a hash that are most likely to be modified, for example after a JPEG compression. We show that machine learning can be used to make a much more reliable prediction than the approaches previously discussed in the literature.