Xu Tan
24
Papers
231
Total Citations
1
Affiliations
Affiliations
Microsoft Research Asia
Papers (24)
PromptTTS 2: Describing and Generating Voices with Text Prompt
ICLR 2024
70
citations
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AAAI 2025
65
citations
GAIA: Zero-shot Talking Avatar Generation
ICLR 2024
46
citations
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
CVPR 2025
31
citations
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
AAAI 2025
19
citations
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation
ICCV 2025
0
citations
HiFace: High-Fidelity 3D Face Reconstruction by Learning Static and Dynamic Details
ICCV 2023arXiv
0
citations
UniAudio: Towards Universal Audio Generation with Large Language Models
ICML 2024
0
citations
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
ICML 2024
0
citations
Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling
NeurIPS 2022
0
citations
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
NeurIPS 2022
0
citations
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face
NeurIPS 2023
0
citations
AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models
NeurIPS 2023
0
citations
Model-Level Dual Learning
ICML 2018
0
citations
Almost Unsupervised Text to Speech and Automatic Speech Recognition
ICML 2019
0
citations
MASS: Masked Sequence to Sequence Pre-training for Language Generation
ICML 2019
0
citations
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
NeurIPS 2018
0
citations
FRAGE: Frequency-Agnostic Word Representation
NeurIPS 2018
0
citations
FastSpeech: Fast, Robust and Controllable Text to Speech
NeurIPS 2019
0
citations
Semi-Supervised Neural Architecture Search
NeurIPS 2020
0
citations
MPNet: Masked and Permuted Pre-training for Language Understanding
NeurIPS 2020
0
citations
Speech-T: Transducer for Text to Speech and Beyond
NeurIPS 2021
0
citations
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
NeurIPS 2021
0
citations
Museformer: Transformer with Fine- and Coarse-Grained Attention for Music Generation
NeurIPS 2022
0
citations