154
Papers
3,483
Total Citations
10
h-index

Papers (154)

VBench: Comprehensive Benchmark Suite for Video Generative Models

CVPR 2024
996
citations

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

ECCV 2024
616
citations

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

ICLR 2024
408
citations

Knowledge Distillation Meets Self-Supervision

ECCV 2020
319
citations

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

CVPR 2024
214
citations

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

ICLR 2024
209
citations

VideoBooth: Diffusion-based Video Generation with Image Prompts

CVPR 2024
118
citations

Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

ECCV 2020
95
citations

InstructVideo: Instructing Video Diffusion Models with Human Feedback

CVPR 2024
80
citations

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

CVPR 2024
49
citations

Digital Life Project: Autonomous 3D Characters with Social Intelligence

CVPR 2024
46
citations

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

ICLR 2024
45
citations

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

CVPR 2024
39
citations

Generative Gaussian Splatting for Unbounded 3D City Generation

CVPR 2025
32
citations

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

CVPR 2024
30
citations

Multi-Space Alignments Towards Universal LiDAR Segmentation

CVPR 2024
30
citations

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

ICCV 2025
27
citations

Material Anything: Generating Materials for Any 3D Object via Diffusion

CVPR 2025
22
citations

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

CVPR 2025arXiv
19
citations

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

ICLR 2025
18
citations

Move Anything with Layered Scene Diffusion

CVPR 2024
13
citations

EgoLM: Multi-Modal Language Model of Egocentric Motions

CVPR 2025
12
citations

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

CVPR 2025
9
citations

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

ICCV 2025
7
citations

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

NeurIPS 2025
7
citations

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

CVPR 2025
7
citations

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

ICCV 2025
5
citations

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

ICCV 2025
5
citations

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

NeurIPS 2025
3
citations

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

CVPR 2025
3
citations

Self-Supervised Scene De-Occlusion

CVPR 2020arXiv
0
citations

When NAS Meets Robustness: In Search of Robust Architectures Against Adversarial Attacks

CVPR 2020arXiv
0
citations

Online Deep Clustering for Unsupervised Representation Learning

CVPR 2020arXiv
0
citations

Rotate-and-Render: Unsupervised Photorealistic Face Rotation From Single-View Images

CVPR 2020
0
citations

MaskGAN: Towards Diverse and Interactive Facial Image Manipulation

CVPR 2020arXiv
0
citations

Open Compound Domain Adaptation

CVPR 2020arXiv
0
citations

Visually Informed Binaural Audio Generation without Binaural Audios

CVPR 2021arXiv
0
citations

Adversarial Robustness Under Long-Tailed Distribution

CVPR 2021arXiv
0
citations

Unsupervised Feature Learning by Cross-Level Instance-Group Discrimination

CVPR 2021arXiv
0
citations

LiDAR-Based Panoptic Segmentation via Dynamic Shifting Network

CVPR 2021arXiv
0
citations

Seesaw Loss for Long-Tailed Instance Segmentation

CVPR 2021arXiv
0
citations

Variational Relational Point Completion Network

CVPR 2021arXiv
0
citations

ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

CVPR 2021arXiv
0
citations

Deep Animation Video Interpolation in the Wild

CVPR 2021arXiv
0
citations

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

CVPR 2021arXiv
0
citations

Robust Reference-Based Super-Resolution via C2-Matching

CVPR 2021arXiv
0
citations

Delving Deep Into the Generalization of Vision Transformers Under Distribution Shifts

CVPR 2022arXiv
0
citations

Versatile Multi-Modal Pre-Training for Human-Centric Perception

CVPR 2022arXiv
0
citations

Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer

CVPR 2022arXiv
0
citations

TCTrack: Temporal Contexts for Aerial Tracking

CVPR 2022arXiv
0
citations

Balanced MSE for Imbalanced Visual Regression

CVPR 2022arXiv
0
citations

Bailando: 3D Dance Generation by Actor-Critic GPT With Choreographic Memory

CVPR 2022arXiv
0
citations

Conditional Prompt Learning for Vision-Language Models

CVPR 2022arXiv
0
citations

Full-Range Virtual Try-On With Recurrent Tri-Level Transform

CVPR 2022
0
citations

Unsupervised Image-to-Image Translation With Generative Prior

CVPR 2022arXiv
0
citations

F2-NeRF: Fast Neural Radiance Field Training With Free Camera Trajectories

CVPR 2023
0
citations

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator

CVPR 2023arXiv
0
citations

LaserMix for Semi-Supervised LiDAR Semantic Segmentation

CVPR 2023arXiv
0
citations

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

CVPR 2023arXiv
0
citations

OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic Perception, Reconstruction and Generation

CVPR 2023arXiv
0
citations

Panoptic Video Scene Graph Generation

CVPR 2023
0
citations

Detecting and Grounding Multi-Modal Media Manipulation

CVPR 2023arXiv
0
citations

Collaborative Diffusion for Multi-Modal Face Generation and Editing

CVPR 2023arXiv
0
citations

Semantic Image Segmentation via Deep Parsing Network

ICCV 2015
0
citations

Deep Learning Face Attributes in the Wild

ICCV 2015
0
citations

Video Frame Synthesis Using Deep Voxel Flow

ICCV 2017arXiv
0
citations

Vision-Infused Deep Audio Inpainting

ICCV 2019
0
citations

CARAFE: Content-Aware ReAssembly of FEatures

ICCV 2019
0
citations

Delving Deep Into Hybrid Annotations for 3D Human Recovery in the Wild

ICCV 2019
0
citations

Unsupervised Domain Adaptive 3D Detection With Multi-Level Consistency

ICCV 2021arXiv
0
citations

Differentiable Dynamic Wirings for Neural Networks

ICCV 2021
0
citations

Talk-To-Edit: Fine-Grained Facial Editing via Dialog

ICCV 2021
0
citations

Incorporating Convolution Designs Into Visual Transformers

ICCV 2021arXiv
0
citations

Semantically Coherent Out-of-Distribution Detection

ICCV 2021arXiv
0
citations

BlockPlanner: City Block Generation With Vectorized Graph Representation

ICCV 2021
0
citations

Energy-Based Open-World Uncertainty Modeling for Confidence Calibration

ICCV 2021arXiv
0
citations

Deep Geometrized Cartoon Line Inbetweening

ICCV 2023
0
citations

Cloth2Body: Generating 3D Human Body Mesh from 2D Clothing

ICCV 2023
0
citations

SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling

ICCV 2023arXiv
0
citations

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

ICCV 2023arXiv
0
citations

DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-Centric Rendering

ICCV 2023
0
citations

SparseNeRF: Distilling Depth Ranking for Few-shot Novel View Synthesis

ICCV 2023arXiv
0
citations

DeformToon3D: Deformable Neural Radiance Fields for 3D Toonification

ICCV 2023
0
citations

UnitedHuman: Harnessing Multi-Source Data for High-Resolution Human Generation

ICCV 2023arXiv
0
citations

StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces

ICCV 2023arXiv
0
citations

ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

ICCV 2023arXiv
0
citations

Rethinking Range View Representation for LiDAR Segmentation

ICCV 2023arXiv
0
citations

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation

CVPR 2025
0
citations

SHERF: Generalizable Human NeRF from a Single Image

ICCV 2023arXiv
0
citations

Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets

ECCV 2020
0
citations

CelebA-Spoof: Large-Scale Face Anti-Spoofing Dataset with Rich Annotations

ECCV 2020
0
citations

Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement

ECCV 2020
0
citations

Placepedia: Comprehensive Place Understanding with Multi-Faceted Annotations

ECCV 2020
0
citations

UNIF: United Neural Implicit Functions for Clothed Human Reconstruction and Animation

ECCV 2022
0
citations

HuMMan: Multi-modal 4D Human Dataset for Versatile Sensing and Modeling

ECCV 2022
0
citations

Benchmarking Omni-Vision Representation through the Lens of Visual Realms

ECCV 2022
0
citations

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

ECCV 2022
0
citations

Detecting and Recovering Sequential DeepFake Manipulation

ECCV 2022
0
citations

Relighting4D: Neural Relightable Human from Videos

ECCV 2022
0
citations

StyleSwap: Style-Based Generator Empowers Robust Face Swapping

ECCV 2022
0
citations

Fast-Vid2Vid: Spatial-Temporal Compression for Video-to-Video Synthesis

ECCV 2022
0
citations

StyleLight: HDR Panorama Generation for Lighting Estimation and Editing

ECCV 2022
0
citations

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

ECCV 2022
0
citations

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

ECCV 2022
0
citations

Panoptic Scene Graph Generation

ECCV 2022
0
citations

Mind the Gap in Distilling StyleGANs

ECCV 2022
0
citations

Text2Performer: Text-Driven Human Video Generation

ICCV 2023arXiv
0
citations

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

CVPR 2025
0
citations

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

CVPR 2025
0
citations

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

CVPR 2025
0
citations

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

CVPR 2025
0
citations

EgoLife: Towards Egocentric Life Assistant

CVPR 2025
0
citations

WildAvatar: Learning In-the-wild 3D Avatars from the Web

CVPR 2025
0
citations

GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination

ICCV 2025
0
citations

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

ICCV 2025
0
citations

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

ICCV 2025
0
citations

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

ICCV 2025
0
citations

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

ICCV 2025
0
citations

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

ICCV 2025
0
citations

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

ICCV 2025
0
citations

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

ICCV 2025
0
citations

SIGMA: Selective Gated Mamba for Sequential Recommendation

AAAI 2025
0
citations

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

CVPR 2024
0
citations

URHand: Universal Relightable Hands

CVPR 2024
0
citations

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

CVPR 2024
0
citations

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

CVPR 2024
0
citations

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

CVPR 2024
0
citations

Vlogger: Make Your Dream A Vlog

CVPR 2024
0
citations

FreeU: Free Lunch in Diffusion U-Net

CVPR 2024
0
citations

Link-Context Learning for Multimodal LLMs

CVPR 2024
0
citations

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

CVPR 2024
0
citations

DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations

CVPR 2016
0
citations

Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade

CVPR 2017arXiv
0
citations

Self-Supervised Learning via Conditional Motion Propagation

CVPR 2019
0
citations

Large-Scale Long-Tailed Recognition in an Open World

CVPR 2019
0
citations

Hybrid Task Cascade for Instance Segmentation

CVPR 2019
0
citations

Few-Shot Object Detection via Association and DIscrimination

NeurIPS 2021
0
citations

Garment4D: Garment Reconstruction from Point Cloud Sequences

NeurIPS 2021
0
citations

Unsupervised Object-Level Representation Learning from Scene Images

NeurIPS 2021
0
citations

Balanced Chamfer Distance as a Comprehensive Metric for Point Cloud Completion

NeurIPS 2021
0
citations

AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies

NeurIPS 2022
0
citations

Audio-Driven Co-Speech Gesture Video Generation

NeurIPS 2022
0
citations

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms

NeurIPS 2022
0
citations

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection

NeurIPS 2022
0
citations

RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars

NeurIPS 2023
0
citations

SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

NeurIPS 2023
0
citations

PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation

NeurIPS 2023
0
citations

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

NeurIPS 2023
0
citations

Towards Robust and Expressive Whole-body Human Pose and Shape Estimation

NeurIPS 2023
0
citations

What Makes Good Examples for Visual In-Context Learning?

NeurIPS 2023
0
citations

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

NeurIPS 2023
0
citations

InsActor: Instruction-driven Physics-based Characters

NeurIPS 2023
0
citations

4D Panoptic Scene Graph Generation

NeurIPS 2023
0
citations

Large Language Models are Visual Reasoning Coordinators

NeurIPS 2023
0
citations