EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation

2citations

Project

Citations

#1491

in ICLR 2025

of 3827 papers

Authors

Data Points

Authors

Jiajian Xie Shengyu Zhang Mengze Li chengfei lv Zhou Zhao Fei Wu

Topics

speech-driven animation 3d talking face audio-visual disentanglement emotional co-disentanglement facial expression generation universal motion distribution speaker-specific generation

Abstract

Speech-driven 3D facial animation has attracted significant attention due to its wide range of applications in animation production and virtual reality. Recent research has explored speech-emotion disentanglement to enhance facial expressions rather than manually assigning emotions. However, this approach face issues such as feature confusion, emotions weakening and mean-face. To address these issues, we present EcoFace, a framework that (1) proposes a novel collaboration objective to provide a explicit signal for emotion representation learning from the speaker's expressive movements and produced sounds, constructing an audio-visual joint and coordinated emotion space that is independent of speech content. (2) constructs a universal facial motion distribution space determined by speech features and implement speaker-specific generation. Extensive experiments show that our method achieves more generalized and emotionally realistic talking face generation compared to previous methods.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 31, 2026

2+2