Hanwang Zhang

28
Papers
254
Total Citations

Papers (28)

Towards Semantic Equivalence of Tokenization in Multimodal LLM

ICLR 2025
57
citations

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

CVPR 2024
51
citations

Doubly Abductive Counterfactual Inference for Text-based Image Editing

CVPR 2024
25
citations

Diffusion Time-step Curriculum for One Image to 3D Generation

CVPR 2024
24
citations

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

CVPR 2025
24
citations

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

CVPR 2025
18
citations

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

ICML 2025
18
citations

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

ICCV 2025arXiv
10
citations

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

ICCV 2025arXiv
7
citations

Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection

AAAI 2024arXiv
5
citations

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

CVPR 2025
4
citations

Dynamic Multimodal Prototype Learning in Vision-Language Models

ICCV 2025
4
citations

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

ICCV 2025
4
citations

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness

CVPR 2025
2
citations

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning

AAAI 2025
1
citations

Discriminative Probing and Tuning for Text-to-Image Generation

CVPR 2024
0
citations

Distributionally Generative Augmentation for Fair Facial Attribute Classification

CVPR 2024
0
citations

DisCo: Disentangled Control for Realistic Human Dance Generation

CVPR 2024
0
citations

Few-shot Learner Parameterization by Diffusion Time-steps

CVPR 2024
0
citations

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

CVPR 2024
0
citations

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

CVPR 2025
0
citations

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

CVPR 2024
0
citations

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

CVPR 2025
0
citations

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

ICML 2024
0
citations

Non-confusing Generation of Customized Concepts in Diffusion Models

ICML 2024
0
citations

MGNet: Learning Correspondences via Multiple Graphs

AAAI 2024arXiv
0
citations

Auto-Encoding Morph-Tokens for Multimodal LLM

ICML 2024
0
citations

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

ICCV 2025
0
citations