Ziyang Ma
10
Papers
183
Total Citations
Papers (10)
ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering
AAAI 2025
64
citations
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
NeurIPS 2025arXiv
52
citations
Language Model Can Listen While Speaking
AAAI 2025
47
citations
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
AAAI 2025
10
citations
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
ICLR 2025
10
citations
VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization
AAAI 2025
0
citations
Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis
NeurIPS 2025arXiv
0
citations
BAT: Learning to Reason about Spatial Sounds with Large Language Models
ICML 2024
0
citations
Handling Motion Blur in Multi-Frame Super-Resolution
CVPR 2015
0
citations
Video Super-Resolution via Deep Draft-Ensemble Learning
ICCV 2015
0
citations