Ziyang Ma
8
Papers
183
Total Citations
Papers (8)
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
64
citations
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
NeurIPS 2025arXiv
52
citations
Language Model Can Listen While Speaking
AAAI 2025
47
citations
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025
10
citations
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
10
citations
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
0
citations
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
NeurIPS 2025arXiv
0
citations
BAT: Learning to Reason about Spatial Sounds with Large Language Models
ICML 2024
0
citations