53
Papers
3,087
Total Citations
10
h-index

Papers (53)

VBench: Comprehensive Benchmark Suite for Video Generative Models

CVPR 2024
996
citations

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

ECCV 2024
616
citations

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

ICLR 2024
408
citations

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

CVPR 2024
214
citations

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

ICLR 2024
209
citations

VideoBooth: Diffusion-based Video Generation with Image Prompts

CVPR 2024
118
citations

InstructVideo: Instructing Video Diffusion Models with Human Feedback

CVPR 2024
80
citations

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

CVPR 2024
49
citations

Digital Life Project: Autonomous 3D Characters with Social Intelligence

CVPR 2024
46
citations

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

ICLR 2024
45
citations

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

CVPR 2024
39
citations

Generative Gaussian Splatting for Unbounded 3D City Generation

CVPR 2025
32
citations

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

CVPR 2024
30
citations

Multi-Space Alignments Towards Universal LiDAR Segmentation

CVPR 2024
30
citations

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

ICCV 2025
27
citations

Material Anything: Generating Materials for Any 3D Object via Diffusion

CVPR 2025
22
citations

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

CVPR 2025arXiv
19
citations

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

ICLR 2025
18
citations

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

ICCV 2025arXiv
17
citations

Move Anything with Layered Scene Diffusion

CVPR 2024
13
citations

EgoLM: Multi-Modal Language Model of Egocentric Motions

CVPR 2025
12
citations

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

CVPR 2025
9
citations

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

CVPR 2025
7
citations

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

ICCV 2025
7
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

NeurIPS 2025
7
citations

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

ICCV 2025
5
citations

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

ICCV 2025arXiv
5
citations

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

NeurIPS 2025
3
citations

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

CVPR 2025
3
citations

WildAvatar: Learning In-the-wild 3D Avatars from the Web

CVPR 2025arXiv
1
citations

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

CVPR 2025
0
citations

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

CVPR 2025
0
citations

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

CVPR 2024
0
citations

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

CVPR 2025
0
citations

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

CVPR 2025
0
citations

GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination

ICCV 2025
0
citations

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

ICCV 2025
0
citations

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

ICCV 2025
0
citations

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

ICCV 2025
0
citations

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

ICCV 2025
0
citations

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

ICCV 2025
0
citations

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

ICCV 2025
0
citations

SIGMA: Selective Gated Mamba for Sequential Recommendation

AAAI 2025
0
citations

EgoLife: Towards Egocentric Life Assistant

CVPR 2025
0
citations

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation

CVPR 2025
0
citations

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

CVPR 2024
0
citations

URHand: Universal Relightable Hands

CVPR 2024
0
citations

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

CVPR 2024
0
citations

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

CVPR 2024
0
citations

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

CVPR 2024
0
citations

Vlogger: Make Your Dream A Vlog

CVPR 2024
0
citations

FreeU: Free Lunch in Diffusion U-Net

CVPR 2024
0
citations

Link-Context Learning for Multimodal LLMs

CVPR 2024
0
citations