Xu Tan

8

Papers

231

Total Citations

1

Affiliations

Affiliations

Microsoft Research Asia

Papers (8)

PromptTTS 2: Describing and Generating Voices with Text Prompt

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

GAIA: Zero-shot Talking Avatar Generation

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

UniAudio: Towards Universal Audio Generation with Large Language Models

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models