Ziyang Ma

10

Papers

183

Total Citations

Papers (10)

ELLA-V: Stable Neural Codec Language Modeling with Alignment-Guided Sequence Reordering

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

NeurIPS 2025arXiv

Language Model Can Listen While Speaking

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

VQTalker: Towards Multilingual Talking Avatars Through Facial Motion Tokenization

Word-Level Emotional Expression Control in Zero-Shot Text-to-Speech Synthesis

NeurIPS 2025arXiv

BAT: Learning to Reason about Spatial Sounds with Large Language Models

Handling Motion Blur in Multi-Frame Super-Resolution

Video Super-Resolution via Deep Draft-Ensemble Learning