SignGen: End-to-End Sign Language Video Generation with Latent Diffusion

0citations

PDF

Citations

#1582

in ECCV 2024

of 2387 papers

Authors

Data Points

Authors

Fan Qi Yu Duan Changsheng Xu Huaiwen Zhang

Abstract

The seamless transformation of textual input into natural and expressive sign language holds profound societal significance. Sign language is not solely about hand gestures. It encompasses vital facial expressions and mouth movements essential for nuanced communication. Achieving both semantic precision and emotional resonance in text-to-sign language translation is of paramount importance. Our work pioneers direct end-to-end translation of text into sign videos, encompassing a realistic representation of the entire body and facial expressions. We go beyond traditional diffusion models by tailoring the multi-model conditions for sign video. Additionally, our modified motion-aware sign generation framework enhances alignment between text and visual cues in sign language, further improving the quality of the generated sign video. Extensive experiments show that our approach significantly outperforms the state-of-the-art approaches in terms of semantic consistency, naturalness, and expressiveness, presenting benchmark quantitative results on the RWTH-2014, RWTH-2014-T, WLASL, CSL-Daily, and AUTSL.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 28, 2026