Xintao Wang

25

Papers

2,824

Total Citations

1

Affiliations

Affiliations

The Chinese University of Hong Kong

Papers (25)

T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion

Follow Your Pose: Pose-Guided Text-to-Video Generation Using Pose-Free Videos

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

Improving Video Generation with Human Feedback

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

GameFactory: Creating New Games with Generative Interactive Videos

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Image Conductor: Precision Control for Interactive Video Synthesis

SketchVideo: Sketch-based Video Generation and Editing

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution

Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models

FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Unifying Image Processing as Visual Prompting Question Answering

CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities