Hanwang Zhang
88
Papers
254
Total Citations
Papers (88)
Towards Semantic Equivalence of Tokenization in Multimodal LLM
ICLR 2025
57
citations
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
CVPR 2024
51
citations
Doubly Abductive Counterfactual Inference for Text-based Image Editing
CVPR 2024
25
citations
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
CVPR 2025
24
citations
Diffusion Time-step Curriculum for One Image to 3D Generation
CVPR 2024
24
citations
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
CVPR 2025
18
citations
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
ICML 2025
18
citations
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
ICCV 2025arXiv
10
citations
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
ICCV 2025
7
citations
Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection
AAAI 2024arXiv
5
citations
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
CVPR 2025
4
citations
Dynamic Multimodal Prototype Learning in Vision-Language Models
ICCV 2025
4
citations
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
ICCV 2025
4
citations
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
CVPR 2025
2
citations
SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning
AAAI 2025
1
citations
Visual Translation Embedding Network for Visual Relation Detection
CVPR 2017arXiv
0
citations
SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning
CVPR 2017
0
citations
Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks
CVPR 2018arXiv
0
citations
Grounding Referring Expressions in Images by Variational Context
CVPR 2018arXiv
0
citations
Learning to Compose Dynamic Tree Structures for Visual Contexts
CVPR 2019
0
citations
Recursive Visual Attention in Visual Dialog
CVPR 2019
0
citations
Explainable and Explicit Visual Reasoning Over Scene Graphs
CVPR 2019
0
citations
Auto-Encoding Scene Graphs for Image Captioning
CVPR 2019
0
citations
Unbiased Scene Graph Generation From Biased Training
CVPR 2020arXiv
0
citations
More Grounded Image Captioning by Distilling Image-Text Matching Model
CVPR 2020arXiv
0
citations
Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration
CVPR 2020
0
citations
Counterfactual Samples Synthesizing for Robust Visual Question Answering
CVPR 2020arXiv
0
citations
Visual Commonsense R-CNN
CVPR 2020
0
citations
Learning to Segment the Tail
CVPR 2020arXiv
0
citations
Two Causal Principles for Improving Visual Dialog
CVPR 2020arXiv
0
citations
Iterative Context-Aware Graph Inference for Visual Dialog
CVPR 2020arXiv
0
citations
Counterfactual Zero-Shot and Open-Set Visual Recognition
CVPR 2021arXiv
0
citations
Distilling Causal Effect of Data in Class-Incremental Learning
CVPR 2021arXiv
0
citations
Counterfactual VQA: A Cause-Effect Look at Language Bias
CVPR 2021arXiv
0
citations
The Blessings of Unlabeled Background in Untrimmed Videos
CVPR 2021arXiv
0
citations
Causal Attention for Vision-Language Tasks
CVPR 2021arXiv
0
citations
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
CVPR 2022arXiv
0
citations
Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery
CVPR 2023
0
citations
Semantic Scene Completion With Cleaner Self
CVPR 2023arXiv
0
citations
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
CVPR 2023arXiv
0
citations
Learning Image and User Features for Recommendation in Social Networks
ICCV 2015
0
citations
Making History Matter: History-Advantage Sequence Training for Visual Dialog
ICCV 2019
0
citations
Learning to Collocate Neural Modules for Image Captioning
ICCV 2019
0
citations
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
ICCV 2019
0
citations
Learning to Assemble Neural Module Tree Networks for Visual Grounding
ICCV 2019
0
citations
Transporting Causal Mechanisms for Unsupervised Domain Adaptation
ICCV 2021arXiv
0
citations
Self-Regulation for Semantic Segmentation
ICCV 2021arXiv
0
citations
Causal Attention for Unbiased Visual Recognition
ICCV 2021arXiv
0
citations
Auto-Parsing Network for Image Captioning and Visual Question Answering
ICCV 2021arXiv
0
citations
Equivariant Similarity for Vision-Language Foundation Models
ICCV 2023arXiv
0
citations
Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition
ICCV 2023arXiv
0
citations
Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
ICCV 2023arXiv
0
citations
Invariant Feature Regularization for Fair Face Recognition
ICCV 2023
0
citations
Learning Trajectory-Word Alignments for Video-Language Tasks
ICCV 2023arXiv
0
citations
Random Boxes Are Open-world Object Detectors
ICCV 2023arXiv
0
citations
Prompt-aligned Gradient for Prompt Tuning
ICCV 2023arXiv
0
citations
Feature Pyramid Transformer
ECCV 2020
0
citations
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data
ECCV 2022
0
citations
Invariant Feature Learning for Generalized Long-Tailed Classification
ECCV 2022
0
citations
Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization
ECCV 2022
0
citations
Identifying Hard Noise in Long-Tailed Sample Distribution
ECCV 2022
0
citations
PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN
ICCV 2017
0
citations
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
CVPR 2025
0
citations
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
CVPR 2025
0
citations
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
ICCV 2025
0
citations
MGNet: Learning Correspondences via Multiple Graphs
AAAI 2024arXiv
0
citations
Discriminative Probing and Tuning for Text-to-Image Generation
CVPR 2024
0
citations
Distributionally Generative Augmentation for Fair Facial Attribute Classification
CVPR 2024
0
citations
DisCo: Disentangled Control for Realistic Human Dance Generation
CVPR 2024
0
citations
Few-shot Learner Parameterization by Diffusion Time-steps
CVPR 2024
0
citations
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
CVPR 2024
0
citations
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
CVPR 2024
0
citations
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
ICML 2024
0
citations
Non-confusing Generation of Customized Concepts in Diffusion Models
ICML 2024
0
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
ICML 2024
0
citations
Online Collaborative Learning for Open-Vocabulary Visual Classifiers
CVPR 2016
0
citations
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks
NeurIPS 2018
0
citations
Causal Intervention for Weakly-Supervised Semantic Segmentation
NeurIPS 2020
0
citations
Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
NeurIPS 2020
0
citations
Interventional Few-Shot Learning
NeurIPS 2020
0
citations
How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?
NeurIPS 2021
0
citations
Introspective Distillation for Robust Question Answering
NeurIPS 2021
0
citations
Self-Supervised Learning Disentangled Group Representation as Feature
NeurIPS 2021
0
citations
Respecting Transfer Gap in Knowledge Distillation
NeurIPS 2022
0
citations
Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation
NeurIPS 2023
0
citations
Tuning Multi-mode Token-level Prompt Alignment across Modalities
NeurIPS 2023
0
citations
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models
NeurIPS 2023
0
citations
Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion
NeurIPS 2023
0
citations