IS&T | Library

Abstract

The ease of capturing, manipulating, distributing, and consuming digital media (e.g., images, audio, video, graphics, and text) has enabled new applications and brought a number of important security challenges to the forefront. These challenges have prompted significant research and development in the areas of digital watermarking, steganography, data hiding, forensics, deepfakes, media identification, biometrics, and encryption to protect owners’ rights, establish provenance and veracity of content, and to preserve privacy. Research results in these areas has been translated into new paradigms and applications for monetizing media while maintaining ownership rights, and new biometric and forensic identification techniques for novel methods for ensuring privacy. The Media Watermarking, Security, and Forensics Conference is a premier destination for disseminating high-quality, cutting-edge research in these areas. The conference provides an excellent venue for researchers and practitioners to present their innovative work as well as to keep abreast of the latest developments in watermarking, security, and forensics. Early results and fresh ideas are particularly encouraged and supported by the conference review format: only a structured abstract describing the work in progress and preliminary results is initially required and the full paper is requested just before the conference. A strong focus on how research results are applied by industry, in practice, also gives the conference its unique flavor.

Digital Library: EI

Published Online: January 2023

Synthetic speech attribution using self supervised audio spectrogram transformer

253 71

speech forensics
synthesized speech
attribution
transformers
mel spectrograms
machine learning
deep learning
media forensics

Amit Kumar Singh Yadav, Emily R. Bartusiak, Kratika Bhagtani, Edward J. Delp

DOI

10.2352/EI.2023.35.4.MWSF-372

Volume 35

Issue 4

Abstract

The ability to synthesize convincing human speech has become easier due to the availability of speech generation tools. This necessitates the development of forensics methods that can authenticate and attribute speech signals. In this paper, we examine a speech attribution task, which identifies the origin of a speech signal. Our proposed method known as Synthetic Speech Attribution Transformer (SSAT) converts speech signals into mel spectrograms and uses a self-supervised pretrained transformer for attribution. This transformer is pretrained on two large publicly available audio datasets: Audio Set and LibriSpeech. We finetune the pretrained transformer on three speech attribution datasets: the DARPA SemaFor Audio Attribution dataset, the ASVspoof2019 dataset, and the 2022 IEEE SP Cup dataset. SSAT achieves high closed-set accuracy on all datasets (99.8% on ASVspoof2019 dataset, 96.3% on SP Cup dataset, and 93.4% on DARPA SemaFor Audio Attribution dataset). We also investigate the methodâ€™s ability to generalize to unknown speech generation methods (open-set scenario). SSAT has high performance, achieving an open-set accuracy of 90.2% on the ASVspoof2019 dataset and 88.45% on DARPA SemaFor Audio Attribution dataset. Finally, we show that our approach is robust to typical compression rates used by YouTube for speech signals.

Digital Library: EI

Published Online: January 2023

Audio captcha breaking and consequences for human users

136 32

Audio Captchas
Noise
Website Security
speech to text

Fabian Oberthür, Martin Steinebach, Verena Battis

DOI

10.2352/EI.2023.35.4.MWSF-373

Volume 35

Issue 4

Abstract

On the Internet, humans must repeatedly identify themselves to gain access to information or to use services. To check whether a request is sent by a human being and not by a computer, a task must be solved. These tasks are called CAPTCHAs and are designed to be easy for most people to solve and at the same time as unsolvable as possible for a computer. In the context of automated OSINT, which requires automatic solving of CAPTCHAs, we investigate the solving of audio CAPTCHAs. For this purpose, a program is written that integrates two common speech-to-text methods. The program achieves very good results and reaches an accuracy of about 81 percent. As CAPTCHAs are also an important tool for Internet access security, we also use the results of our attack to make suggestions for improving the security of these CAPTCHAs. We compares human listeners with computers and reveal weaknesses of audio CAPTCHAs.

Digital Library: EI

Published Online: January 2023

Cost polarization by dequantizing for JPEG steganography

96 24

steganography
jpeg
deblockers

Edgar Kaziakhmedov, Yassine Yousfi, Eli Dworetzky, Jessica Fridrich

DOI

10.2352/EI.2023.35.4.MWSF-374

Volume 35

Issue 4

Abstract

In this article, we study a recently proposed method for improving empirical security of steganography in JPEG images in which the sender starts with an additive embedding scheme with symmetrical costs of ± 1 changes and then decreases the cost of one of these changes based on an image obtained by applying a deblocking (JPEG dequantization) algorithm to the cover JPEG. This approach provides rather significant gains in security at negligible embedding complexity overhead for a wide range of quality factors and across various embedding schemes. Challenging the original explanation of the inventors of this idea, which is based on interpreting the dequantized image as an estimate of the precover (uncompressed) image, we provide alternative arguments. The key observation and the main reason why this approach works is how the polarizations of individual DCT coefficients work together. By using a MiPOD model of content complexity of the uncompressed cover image, we show that the cost polarization technique decreases the chances of “bad” combinations of embedding changes that would likely be introduced by the original scheme with symmetric costs. This statement is quantified by computing the likelihood of the stego image w.r.t. the multivariate Gaussian precover distribution in DCT domain. Furthermore, it is shown that the cost polarization decreases spatial discontinuities between blocks (blockiness) in the stego image and enforces desirable correlations of embedding changes across blocks. To further prove the point, it is shown that in a source that adheres to the precover model, a simple Wiener filter can serve equally well as a deep-learning based deblocker

Digital Library: EI

Published Online: January 2023

Predicting positions of flipped bits in robust image hashes

93 27

Robust hashing
Cryptographic hashing
Privacy
Hybrid Hash
Machine Learing

Marius Leon Hammann, Martin Steinebach, Huajian Liu, Niklas Bunzel

DOI

10.2352/EI.2023.35.4.MWSF-375

Volume 35

Issue 4

Abstract

Both robust and cryptographic hash methods have advantages and disadvantages. It would be ideal if robustness and cryptographic confidentiality could be combined. The problem here is that the concept of similarity of robust hashes cannot be applied to cryptographic hashes. Therefore, methods must be developed to reliably intercept the degrees of freedom of robust hashes before they are included in a cryptographic hash, but without losing their robustness. To achieve this, we need to predict the bits of a hash that are most likely to be modified, for example after a JPEG compression. We show that machine learning can be used to make a much more reliable prediction than the approaches previously discussed in the literature.

Digital Library: EI

Published Online: January 2023

LECA: A learned approach for efficient cover-agnostic watermarking

236 43

watermarking
deep learning
machine learning

Xiyang Luo, Michael Goebel, Elnaz Barshan, Feng Yang

DOI

10.2352/EI.2023.35.4.MWSF-376

Volume 35

Issue 4

Abstract

In this work, we present an efficient multi-bit deep image watermarking method that is cover-agnostic yet also robust to geometric distortions such as translation and scaling as well as other distortions such as JPEG compression and noise. Our design consists of a light-weight watermark encoder jointly trained with a deep neural network based decoder. Such a design allows us to retain the efficiency of the encoder while fully utilizing the power of a deep neural network. Moreover, the watermark encoder is independent of the image content, allowing users to pre-generate the watermarks for further efficiency. To offer robustness towards geometric transformations, we introduced a learned model for predicting the scale and offset of the watermarked images. Moreover, our watermark encoder is independent of the image content, making the generated watermarks universally applicable to different cover images. Experiments show that our method outperforms comparably efficient watermarking methods by a large margin.

Digital Library: EI

Published Online: January 2023

Privacy preserving leak detection in peer-to-peer communication

112 39

Media Forensics
Digital watermarking
Peer to peer communication
Leakage prevention
cyberbullying

Julian Heeger, Simon Bugert, Waldemar Berchtold, Alexander Gruler, Martin Steinebach

DOI

10.2352/EI.2023.35.4.MWSF-377

Volume 35

Issue 4

Abstract

During the pandemic the usage of video platforms skyrocketed among office workers and students and even today, when more and more events are held on-site again, the usage of video platforms is at an all-time high. However, the many advantages of these platforms cannot hide some problems. In the professional field, the publication of audio recordings without the consent of the author can get him into trouble. In education, another problem is bullying. The distance from the victim lowers the inhibition threshold for bullying, which means that platforms need tools to combat it. In this work, we present a system, which can not only identify the person leaking the footage, but also identify all other persons present in the footage. This system can be used in both described scenarios.

Digital Library: EI

Published Online: January 2023

Pros and cons of comparing and combining hand-crafted and neural network based DeepFake detection based on eye blinking behavior

201 62

Media Forensics
DeepFake Detection
Eye Blinking Behavior
Machine Learning with hand-crafted and learned feature spaces
Biometrics

Dennis Siegel, Stefan Seidlitz, Christian Krätzer, Jana Dittmann

Pages 378-1 - 378-6, January 2023, This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. 2023

DOI

10.2352/EI.2023.35.4.MWSF-378

Volume 35

Issue 4

Abstract

DeepFakes are a recent trend in computer vision, posing a thread to authenticity of digital media. For the detection of DeepFakes most prominently neural network based approaches are used. Those detectors often lack explanatory power on why the given decision was made, due to their black-box nature. Furthermore, taking the social, ethical and legal perspective (e.g. the upcoming European Commission in the Artificial Intelligence Act) into account, black-box decision methods should be avoided and Human Oversight should be guaranteed. In terms of explainability of AI systems, many approaches work based on post-hoc visualization methods (e.g. by back-propagation) or the reduction of complexity. In our paper a different approach is used, combining hand-crafted as well as neural network based components analyzing the same phenomenon to aim for explainability. The exemplary chosen semantic phenomenon analyzed here is the eye blinking behavior in a genuine or DeepFake video. Furthermore, the impact of video duration on the classification result is evaluated empirically, so that a minimum duration threshold can be set to reasonably detect DeepFakes.

Digital Library: EI

Published Online: January 2023

Human-in-control and quality assurance aspects for a benchmarking framework for DeepFake detection models

126 44

Media Forensics
Forensic Process Models
DeepFake Detection Methods
Benchmarking
Decision Models in Artificial Intelligence

Christian Krätzer, Dennis Siegel, Stefan Seidlitz, Jana Dittmann

Pages 379--1 - 379-6, January 2023, This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. 2023

DOI

10.2352/EI.2023.35.4.MWSF-379

Volume 35

Issue 4

Abstract

Human-in-control is a principle that has long been established in forensics as a strict requirement and is nowadays also receiving more and more attention in many other fields of application where artificial intelligence (AI) is used. This renewed interest is due to the fact that many regulations (among others the the EU Artificial Intelligence Act (AIA)) emphasize it as a necessity for any critical AI application scenario. In this paper, human-in-control and quality assurance aspects for a benchmarking framework to be used in media forensics are discussed and their usage is illustrated in the context of the media forensics sub-discipline of DeepFake detection.

Digital Library: EI

Published Online: January 2023

Detecting GAN-generated synthetic images using semantic inconsistencies

156 43

Image Synthesis
Semantic Detection
Siamese Networks
Multimedia Forensics
Deep Learning

Danial Samadi Vahdati, Matthew C. Stamm

DOI

10.2352/EI.2023.35.4.MWSF-380

Volume 35

Issue 4