Di ZHANG

18

Papers

174

Total Citations

Papers (18)

Learning Multi-Dimensional Human Preference for Text-to-Image Generation

GameFactory: Creating New Games with Generative Interactive Videos

Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation

SketchVideo: Sketch-based Video Generation and Editing

GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections

GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution

Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

ReCamMaster: Camera-Controlled Generative Rendering from A Single Video

How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach

Scene Graph Guided Generation: Enable Accurate Relations Generation in Text-to-Image Models via Textural Rectification

FullDiT: Video Generative Foundation Models with Multimodal Control via Full Attention

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

StyleMaster: Stylize Your Video with Artistic Generation and Translation

Towards Precise Scaling Laws for Video Diffusion Transformers

Imbalance in Balance: Online Concept Balancing in Generation Models

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content