Xie Chen

9

Papers

173

Total Citations

Papers (9)

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

NeurIPS 2025arXiv

Language Model Can Listen While Speaking

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration

Bitrate-Controlled Diffusion for Disentangling Motion and Content in Video

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis

NeurIPS 2025arXiv

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding