Xu Tan

24

Papers

231

Total Citations

1

Affiliations

Affiliations

Microsoft Research Asia

Papers (24)

PromptTTS 2: Describing and Generating Voices with Text Prompt

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

GAIA: Zero-shot Talking Avatar Generation

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details

UniAudio: Towards Universal Audio Generation with Large Language Models

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

Model-Level Dual Learning

Almost Unsupervised Text to Speech and Automatic Speech Recognition

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

FRAGE: Frequency-Agnostic Word Representation

FastSpeech: Fast, Robust and Controllable Text to Speech

Semi-Supervised Neural Architecture Search

MPNet: Masked and Permuted Pre-training for Language Understanding

Speech-T: Transducer for Text to Speech and Beyond

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation