Convolutional neural networks (CNNs) have received significant attention due to their ability to adaptively learn classification features directly from data. While CNNs have helped cause dramatic advances in fields such as object and speech recognition, multimedia forensics is fundamentally different problem compared to other deep learning applications. Little work exists to guide the design of CNN architectures for forensic tasks. Furthermore, it is still unclear which forensic tasks can be performed using CNNs. In this work, we investigate the design of CNNs for multiple multimedia forensic applications. We show that CNNs are capable of performing image manipulation detection as well as camera model identification. Through a series of experiments, we systematically examine the influence of several important CNN design choices for forensic applications, such as the use of a constrained convolutional layer or fixed high-pass filter at the beginning of the CNN, the use of nonlinearity after the first layer, the choice of activation and pooling functions, etc. We show that different CNN design choices should be made for different forensic applications and identify design choices to maximize the performance of CNNs for manipulation detection and camera model identification.
Belhassen Bayar, Matthew C. Stamm, "Design Principles of Convolutional Neural Networks for Multimedia Forensics" in Proc. IS&T Int’l. Symp. on Electronic Imaging: Media Watermarking, Security, and Forensics, 2017, pp 77 - 86, https://doi.org/10.2352/ISSN.2470-1173.2017.7.MWSF-328