Ziyang Ma

8

Papers

183

Total Citations

Papers (8)

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

NeurIPS 2025arXiv

Language Model Can Listen While Speaking

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis

NeurIPS 2025arXiv

BAT: Learning to Reason about Spatial Sounds with Large Language Models