Ziwei Liu

Google Scholar OpenReview

53

Papers

3,087

Total Citations

10

h-index

Papers (53)

VBench: Comprehensive Benchmark Suite for Video Generative Models

LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

VideoBooth: Diffusion-based Video Generation with Image Prompts

InstructVideo: Instructing Video Diffusion Models with Human Feedback

FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation

Digital Life Project: Autonomous 3D Characters with Social Intelligence

Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation

Generative Gaussian Splatting for Unbounded 3D City Generation

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

Multi-Space Alignments Towards Universal LiDAR Segmentation

VistaDream: Sampling multiview consistent images for single-view scene reconstruction

Material Anything: Generating Materials for Any 3D Object via Diffusion

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

Move Anything with Layered Scene Diffusion

EgoLM: Multi-Modal Language Model of Egocentric Motions

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters

Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion

GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data

AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers

WildAvatar: Learning In-the-wild 3D Avatars from the Web

Disco4D: Disentangled 4D Human Generation and Animation from a Single Image

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion

LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes

GauUpdate: New Object Insertion in 3D Gaussian Fields with Consistent Global Illumination

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data and Metric Perspectives

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

DPoser-X: Diffusion Model as Robust 3D Whole-body Human Pose Prior

SIGMA: Selective Gated Mamba for Sequential Recommendation

EgoLife: Towards Egocentric Life Assistant

HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation

GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

URHand: Universal Relightable Hands

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

CityDreamer: Compositional Generative Model of Unbounded 3D Cities

Vlogger: Make Your Dream A Vlog

FreeU: Free Lunch in Diffusion U-Net

Link-Context Learning for Multimodal LLMs