In this paper, we present a method that analyzes an audio stream in real time and provides an indication of whether the voice is synthetic generated by a voice clone or a text to speech model. Unlike state-of-the-art techniques that rely on self-supervised (SSL) or non-self-supervised learning, this method is deterministic and focuses on the analysis of tonal and non-tonal components within an audio stream. By leveraging principles from the MPEG-1 global masking threshold, the algorithm systematically evaluates tonal and noise components within a defined frequency range. The underlying hypothesis is that synthesized audio exhibits distinct tonal and non-tonal characteristics compared to original human speech, which can be quantified for classification. This interpretable, deterministic framework addresses key limitations of existing SSL-based approaches, including high computational costs and limited transparency. Beyond detecting synthesized speech, the method provides insights into the likely model used for generation. Experimental evaluations demonstrate the algorithm’s effectiveness, revealing distinct and consistent patterns across various TTS and voice conversion (VC) models, thereby offering a reliable and computationally efficient solution for audio authenticity verification. The proposed algorithm is developed and tested on a small dataset and show an excellent separation between different solution providers and genuine voices.