There are a number of ways to reproduce an image, for an example gamut mapping, halftoning and compression. To find the best reproduction among a number of variants of the same reproduction algorithm, a psychophysical experiment can be carried out. Image difference metrics have been introduced to eliminate these experiments. To do this the metrics must reflect the perceived image difference. One way to evaluate the overall performance of image differnece metrics is to compute the correlation coefficient between perceived and predicted image difference. This does not always reflect the true performance of the metric, therefore we propose to use the ranking based on the predicted image difference for each scene as a data set for the rank order method. This results in a z-score similar to the overall perceived image difference, the correlation coefficient between metric z-score and perceived z-score reflects the overall performance of the image difference metrics.