Steganalysis into the Wild: How to Define a Source?

Quentin Gibouloto; Rémi Cogranneo; Patrick Bas

doi:10.2352/ISSN.2470-1173.2018.07.MWSF-318

It is now well known that practical steganalysis using machine learning techniques can be strongly biased by the problem of Cover Source Mismatch. Such a phenomenon usually occurs in machine learning when the training and the testing sets are drawn from different sources, i.e. when they do not share the same statistical properties. In the field of steganalysis however, due to the small power of the signal targeted by steganalysis methods, it can drastically lower their performance. This paper aims to define through practical experiments what is a source in steganalysis. By assuming that two cover datasets coming from a common source should provide comparable performances in steganalysis, it is shown that the definition of a source is more related with the processing pipeline of the RAW images than with the sensor or the acquisition setup of the pictures. In order to measure the discrepancy between sources, this paper introduces the concept of consistency between sources, that quantifies how much two sources are subject to Cover Source Mismatch. We show that by adopting "training design", we can increase the consistency between the training set and the testing set. To measure how much image processing operation may help the steganographers this paper also introduces the intrinsic difficulty of a source. It is observed that some processes such as JPEG quantization tables or the development pipeline can dramatically increase or decrease the performance of steganalysis methods and that other parameters such as the ISO sensitivity or the sensor model have minor impact on the performance.

articleview.keywords