Hanwang Zhang
28
Papers
254
Total Citations
Papers (28)
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025
57
citations
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
CVPR 2024
51
citations
Doubly Abductive Counterfactual Inference for Text-based Image Editing
CVPR 2024
25
citations
Diffusion Time-step Curriculum for One Image to 3D Generation
CVPR 2024
24
citations
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
CVPR 2025
24
citations
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
18
citations
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
ICML 2025
18
citations
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
ICCV 2025arXiv
10
citations
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
ICCV 2025arXiv
7
citations
Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection
AAAI 2024arXiv
5
citations
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
4
citations
Dynamic Multimodal Prototype Learning in Vision-Language Models
ICCV 2025
4
citations
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
ICCV 2025
4
citations
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
CVPR 2025
2
citations
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
AAAI 2025
1
citations
Discriminative Probing and Tuning for Text-to-Image Generation
CVPR 2024
0
citations
Distributionally Generative Augmentation for Fair Facial Attribute Classification
CVPR 2024
0
citations
DisCo: Disentangled Control for Realistic Human Dance Generation
CVPR 2024
0
citations
Few-shot Learner Parameterization by Diffusion Time-steps
CVPR 2024
0
citations
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
CVPR 2024
0
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
0
citations
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
CVPR 2024
0
citations
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
CVPR 2025
0
citations
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
ICML 2024
0
citations
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
0
citations
MGNet: Learning Correspondences via Multiple Graphs
AAAI 2024arXiv
0
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
0
citations
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
ICCV 2025
0
citations