
Generative AI (GenAI) models enable scalable multimedia content creation but can introduce artifacts that lack perceived realism. We conducted a perceptual study to assess how audio-visual cues impact people’s ability to discriminate real user-generated content (UGC) from synthetic AI-generated content. Observers (N=36) participated in a two-interval forced-choice task across conditions that manipulated audiovisual consistency. They reliably identified synthetic content, achieving the highest accuracy when visual cues were available and the lowest when having to solely rely on audio content/quality issues. Our eye-tracking analysis indicated that biological motion inconsistencies were salient, while lower-level, texture-related distortions received less attention. Our proposed taxonomy of audio content and quality issues did not significantly predict task performance. However, these findings highlight the dominant role of visual artifacts in the decision-making process and the relative robustness of GenAI audio. Our work provides guidance for improving the perceptual quality of future, edge-deployed GenAI models.
Devi Klein, Anustup Choudhury, Evan Gitterman, Jaclyn Pytlarz, Scott Daly, "Real or Synthetic? An Evaluation of AI-generated Audio-visual Content" in Electronic Imaging, 2026, pp 232-1 - 232-8, https://doi.org/10.2352/EI.2026.38.10.HVEI-232