Hanwang Zhang

88
Papers
254
Total Citations

Papers (88)

Towards Semantic Equivalence of Tokenization in Multimodal LLM

ICLR 2025
57
citations

Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior

CVPR 2024
51
citations

Doubly Abductive Counterfactual Inference for Text-based Image Editing

CVPR 2024
25
citations

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training

CVPR 2025
24
citations

Diffusion Time-step Curriculum for One Image to 3D Generation

CVPR 2024
24
citations

Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens

CVPR 2025
18
citations

Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing

ICML 2025
18
citations

Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation

ICCV 2025arXiv
10
citations

Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning

ICCV 2025
7
citations

Dual-Perspective Knowledge Enrichment for Semi-supervised 3D Object Detection

AAAI 2024arXiv
5
citations

Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene

CVPR 2025
4
citations

Dynamic Multimodal Prototype Learning in Vision-Language Models

ICCV 2025
4
citations

Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models

ICCV 2025
4
citations

Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness

CVPR 2025
2
citations

SGDiff: Scene Graph Guided Diffusion Model for Image Collaborative SegCaptioning

AAAI 2025
1
citations

Visual Translation Embedding Network for Visual Relation Detection

CVPR 2017arXiv
0
citations

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning

CVPR 2017
0
citations

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

CVPR 2018arXiv
0
citations

Grounding Referring Expressions in Images by Variational Context

CVPR 2018arXiv
0
citations

Learning to Compose Dynamic Tree Structures for Visual Contexts

CVPR 2019
0
citations

Recursive Visual Attention in Visual Dialog

CVPR 2019
0
citations

Explainable and Explicit Visual Reasoning Over Scene Graphs

CVPR 2019
0
citations

Auto-Encoding Scene Graphs for Image Captioning

CVPR 2019
0
citations

Unbiased Scene Graph Generation From Biased Training

CVPR 2020arXiv
0
citations

More Grounded Image Captioning by Distilling Image-Text Matching Model

CVPR 2020arXiv
0
citations

Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration

CVPR 2020
0
citations

Counterfactual Samples Synthesizing for Robust Visual Question Answering

CVPR 2020arXiv
0
citations

Visual Commonsense R-CNN

CVPR 2020
0
citations

Learning to Segment the Tail

CVPR 2020arXiv
0
citations

Two Causal Principles for Improving Visual Dialog

CVPR 2020arXiv
0
citations

Iterative Context-Aware Graph Inference for Visual Dialog

CVPR 2020arXiv
0
citations

Counterfactual Zero-Shot and Open-Set Visual Recognition

CVPR 2021arXiv
0
citations

Distilling Causal Effect of Data in Class-Incremental Learning

CVPR 2021arXiv
0
citations

Counterfactual VQA: A Cause-Effect Look at Language Bias

CVPR 2021arXiv
0
citations

The Blessings of Unlabeled Background in Untrimmed Videos

CVPR 2021arXiv
0
citations

Causal Attention for Vision-Language Tasks

CVPR 2021arXiv
0
citations

Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation

CVPR 2022arXiv
0
citations

Bootstrap Your Own Prior: Towards Distribution-Agnostic Novel Class Discovery

CVPR 2023
0
citations

Semantic Scene Completion With Cleaner Self

CVPR 2023arXiv
0
citations

Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

CVPR 2023arXiv
0
citations

Learning Image and User Features for Recommendation in Social Networks

ICCV 2015
0
citations

Making History Matter: History-Advantage Sequence Training for Visual Dialog

ICCV 2019
0
citations

Learning to Collocate Neural Modules for Image Captioning

ICCV 2019
0
citations

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

ICCV 2019
0
citations

Learning to Assemble Neural Module Tree Networks for Visual Grounding

ICCV 2019
0
citations

Transporting Causal Mechanisms for Unsupervised Domain Adaptation

ICCV 2021arXiv
0
citations

Self-Regulation for Semantic Segmentation

ICCV 2021arXiv
0
citations

Causal Attention for Unbiased Visual Recognition

ICCV 2021arXiv
0
citations

Auto-Parsing Network for Image Captioning and Visual Question Answering

ICCV 2021arXiv
0
citations

Equivariant Similarity for Vision-Language Foundation Models

ICCV 2023arXiv
0
citations

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition

ICCV 2023arXiv
0
citations

Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground

ICCV 2023arXiv
0
citations

Invariant Feature Regularization for Fair Face Recognition

ICCV 2023
0
citations

Learning Trajectory-Word Alignments for Video-Language Tasks

ICCV 2023arXiv
0
citations

Random Boxes Are Open-world Object Detectors

ICCV 2023arXiv
0
citations

Prompt-aligned Gradient for Prompt Tuning

ICCV 2023arXiv
0
citations

Feature Pyramid Transformer

ECCV 2020
0
citations

Equivariance and Invariance Inductive Bias for Learning from Insufficient Data

ECCV 2022
0
citations

Invariant Feature Learning for Generalized Long-Tailed Classification

ECCV 2022
0
citations

Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-of-Distribution Generalization

ECCV 2022
0
citations

Identifying Hard Noise in Long-Tailed Sample Distribution

ECCV 2022
0
citations

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

ICCV 2017
0
citations

CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction

CVPR 2025
0
citations

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea

CVPR 2025
0
citations

Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization

ICCV 2025
0
citations

MGNet: Learning Correspondences via Multiple Graphs

AAAI 2024arXiv
0
citations

Discriminative Probing and Tuning for Text-to-Image Generation

CVPR 2024
0
citations

Distributionally Generative Augmentation for Fair Facial Attribute Classification

CVPR 2024
0
citations

DisCo: Disentangled Control for Realistic Human Dance Generation

CVPR 2024
0
citations

Few-shot Learner Parameterization by Diffusion Time-steps

CVPR 2024
0
citations

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness

CVPR 2024
0
citations

Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs

CVPR 2024
0
citations

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

ICML 2024
0
citations

Non-confusing Generation of Customized Concepts in Diffusion Models

ICML 2024
0
citations

Auto-Encoding Morph-Tokens for Multimodal LLM

ICML 2024
0
citations

Online Collaborative Learning for Open-Vocabulary Visual Classifiers

CVPR 2016
0
citations

Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks

NeurIPS 2018
0
citations

Causal Intervention for Weakly-Supervised Semantic Segmentation

NeurIPS 2020
0
citations

Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect

NeurIPS 2020
0
citations

Interventional Few-Shot Learning

NeurIPS 2020
0
citations

How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?

NeurIPS 2021
0
citations

Introspective Distillation for Robust Question Answering

NeurIPS 2021
0
citations

Self-Supervised Learning Disentangled Group Representation as Feature

NeurIPS 2021
0
citations

Respecting Transfer Gap in Knowledge Distillation

NeurIPS 2022
0
citations

Make the U in UDA Matter: Invariant Consistency Learning for Unsupervised Domain Adaptation

NeurIPS 2023
0
citations

Tuning Multi-mode Token-level Prompt Alignment across Modalities

NeurIPS 2023
0
citations

Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models

NeurIPS 2023
0
citations

Imagine That! Abstract-to-Intricate Text-to-Image Synthesis with Scene Graph Hallucination Diffusion

NeurIPS 2023
0
citations