Back to articles
Proceedings Paper
Volume: 37 | Article ID: MOBMU-310
Image
Detecting Voice Cloning and Text to Speech Audio in Real Time on Mobile Devices
  DOI :  10.2352/EI.2025.37.3.MOBMU-310  Published OnlineFebruary 2025
Abstract
Abstract

In this paper, we present a method that analyzes an audio stream in real time and provides an indication of whether the voice is synthetic generated by a voice clone or a text to speech model. Unlike state-of-the-art techniques that rely on self-supervised (SSL) or non-self-supervised learning, this method is deterministic and focuses on the analysis of tonal and non-tonal components within an audio stream. By leveraging principles from the MPEG-1 global masking threshold, the algorithm systematically evaluates tonal and noise components within a defined frequency range. The underlying hypothesis is that synthesized audio exhibits distinct tonal and non-tonal characteristics compared to original human speech, which can be quantified for classification. This interpretable, deterministic framework addresses key limitations of existing SSL-based approaches, including high computational costs and limited transparency. Beyond detecting synthesized speech, the method provides insights into the likely model used for generation. Experimental evaluations demonstrate the algorithm’s effectiveness, revealing distinct and consistent patterns across various TTS and voice conversion (VC) models, thereby offering a reliable and computationally efficient solution for audio authenticity verification. The proposed algorithm is developed and tested on a small dataset and show an excellent separation between different solution providers and genuine voices.

Subject Areas :
Views 3
Downloads 0
 articleview.views 3
 articleview.downloads 0
  Cite this article 

Waldemar Berchtold, Julian Heeger, Simon Bugert, Martin Steinebach, "Detecting Voice Cloning and Text to Speech Audio in Real Time on Mobile Devicesin Electronic Imaging,  2025,  pp 310-1 - 310-6,  https://doi.org/10.2352/EI.2025.37.3.MOBMU-310

 Copy citation
  Copyright statement 
Copyright © 2025, Society for Imaging Science and Technology
ei
Electronic Imaging
2470-1173
2470-1173
Society for Imaging Science and Technology
IS&T 7003 Kilworth Lane, Springfield, VA 22151 USA