Back to articles
Special issue: ACDM 2025
Volume: 70 | Article ID: 010404
Image
SynPoseVAE: A Multimodal Fusion Framework for Audio-Driven Full-Body Animation Synthesis
  DOI :  10.2352/J.ImagingSci.Technol.2026.70.1.010404  Published OnlineJanuary 2026
Abstract
Abstract

In the realm of audio-driven facial animation, most existing research predominantly focuses on head animations, and there is a scarcity of methods capable of generating full-body videos. The few approaches that can produce full-body videos usually concentrate solely on facial animations, resulting in the prevalent issue of head–body separation. This disjointedness seriously undermines the overall visual coherence and the naturalness of human–computer interaction. To overcome these limitations, the authors introduce SynPoseVAE, an enhanced version of the PoseVAE model. This method innovatively incorporates body-related information. SynPoseVAE effectively acquires detailed human pose data by adopting a bottom-up human pose estimation method to detect human key points and incorporates it into pose prediction, thereby solving the problem of head–body separation. Additionally, we design a new loss function that takes into account both head and body postures. It serves as a crucial regulator, enhancing the coordination between head and body movements. By optimizing based on this loss function, the model can significantly reduce the head–body separation problem, ensuring that the generated animations are more natural and coherent. Experimental results show that SynPoseVAE outperforms traditional methods. It can generate highly coordinated full-body animations, greatly improving the quality of human–computer interaction in the context of voice-driven facial animation synthesis.

Subject Areas :
Views 39
Downloads 1
 articleview.views 39
 articleview.downloads 1
  Cite this article 

Junyi Gao, Xiuting Tao, Yigang Wang, "SynPoseVAE: A Multimodal Fusion Framework for Audio-Driven Full-Body Animation Synthesisin Journal of Imaging Science and Technology,  2026,  pp 1 - 8,  https://doi.org/10.2352/J.ImagingSci.Technol.2026.70.1.010404

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2026
  Article timeline 
  • received May 2025
  • accepted October 2025
  • PublishedJanuary 2026

Preprint submitted to:
  Login or subscribe to view the content