Xu Tan

24
Papers
231
Total Citations
1
Affiliations

Affiliations

Microsoft Research Asia

Papers (24)

PromptTTS 2: Describing and Generating Voices with Text Prompt

ICLR 2024
70
citations

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

AAAI 2025
65
citations

GAIA: Zero-shot Talking Avatar Generation

ICLR 2024
46
citations

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

CVPR 2025
31
citations

InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation

AAAI 2025
19
citations

The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

ICCV 2025
0
citations

HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details

ICCV 2023arXiv
0
citations

UniAudio: Towards Universal Audio Generation with Large Language Models

ICML 2024
0
citations

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

ICML 2024
0
citations

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

NeurIPS 2022
0
citations

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

NeurIPS 2022
0
citations

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

NeurIPS 2023
0
citations

AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models

NeurIPS 2023
0
citations

Model-Level Dual Learning

ICML 2018
0
citations

Almost Unsupervised Text to Speech and Automatic Speech Recognition

ICML 2019
0
citations

MASS: Masked Sequence to Sequence Pre-training for Language Generation

ICML 2019
0
citations

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

NeurIPS 2018
0
citations

FRAGE: Frequency-Agnostic Word Representation

NeurIPS 2018
0
citations

FastSpeech: Fast, Robust and Controllable Text to Speech

NeurIPS 2019
0
citations

Semi-Supervised Neural Architecture Search

NeurIPS 2020
0
citations

MPNet: Masked and Permuted Pre-training for Language Understanding

NeurIPS 2020
0
citations

Speech-T: Transducer for Text to Speech and Beyond

NeurIPS 2021
0
citations

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

NeurIPS 2021
0
citations

Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation

NeurIPS 2022
0
citations