Gao Huang

74
Papers
228
Total Citations

Papers (74)

GSVA: Generalized Segmentation via Multimodal Large Language Models

CVPR 2024
127
citations

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

CVPR 2024
28
citations

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

ECCV 2024
21
citations

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

CVPR 2025
20
citations

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

ECCV 2024
15
citations

Video Perception Models for 3D Scene Synthesis

NeurIPS 2025
5
citations

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

CVPR 2025
5
citations

GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling

ICLR 2025
4
citations

DTOS: Dynamic Time Object Sensing with Large Multimodal Model

CVPR 2025
2
citations

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

ICCV 2025arXiv
1
citations

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

CVPR 2024
0
citations

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

CVPR 2024
0
citations

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

ICML 2024
0
citations

Densely Connected Convolutional Networks

CVPR 2017arXiv
0
citations

CondenseNet: An Efficient DenseNet Using Learned Group Convolutions

CVPR 2018arXiv
0
citations

Resource Aware Person Re-Identification Across Multiple Resolutions

CVPR 2018arXiv
0
citations

Resolution Adaptive Networks for Efficient Inference

CVPR 2020arXiv
0
citations

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

CVPR 2021arXiv
0
citations

Cross-Iteration Batch Normalization

CVPR 2021arXiv
0
citations

3D Object Detection With Pointformer

CVPR 2021arXiv
0
citations

Vision Transformer With Deformable Attention

CVPR 2022arXiv
0
citations

DiSparse: Disentangled Sparsification for Multitask Model Compression

CVPR 2022
0
citations

On the Integration of Self-Attention and Convolution

CVPR 2022arXiv
0
citations

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

CVPR 2022
0
citations

AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks

CVPR 2022
0
citations

Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework

CVPR 2022arXiv
0
citations

AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition

CVPR 2022arXiv
0
citations

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information

CVPR 2023arXiv
0
citations

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

CVPR 2023
0
citations

Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning

CVPR 2023arXiv
0
citations

Siamese Image Modeling for Self-Supervised Vision Representation Learning

CVPR 2023arXiv
0
citations

Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention

CVPR 2023
0
citations

Learning Efficient Convolutional Networks Through Network Slimming

ICCV 2017arXiv
0
citations

Improved Techniques for Training Adaptive Deep Networks

ICCV 2019
0
citations

Adaptive Focus for Efficient Video Recognition

ICCV 2021arXiv
0
citations

Towards Learning Spatially Discriminative Feature Representations

ICCV 2021arXiv
0
citations

Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving

ICCV 2021arXiv
0
citations

FLatten Transformer: Vision Transformer using Focused Linear Attention

ICCV 2023arXiv
0
citations

Dynamic Perceiver for Efficient Visual Recognition

ICCV 2023arXiv
0
citations

Adaptive Rotated Convolution for Rotated Object Detection

ICCV 2023arXiv
0
citations

EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones

ICCV 2023arXiv
0
citations

Deep Incubation: Training Large Models by Divide-and-Conquering

ICCV 2023arXiv
0
citations

Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm

ICCV 2023
0
citations

Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation

ECCV 2020
0
citations

AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition

ECCV 2022
0
citations

Learning to Weight Samples for Dynamic Early-Exiting Networks

ECCV 2022
0
citations

ActiveNeRF: Learning Where to See with Uncertainty Estimation

ECCV 2022
0
citations

Supervised Word Mover's Distance

NeurIPS 2016
0
citations

CODA: Repurposing Continuous VAEs for Discrete Tokenization

ICCV 2025
0
citations

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

CVPR 2025arXiv
0
citations

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

CVPR 2025
0
citations

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

CVPR 2025
0
citations

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

CVPR 2025
0
citations

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

CVPR 2025
0
citations

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints

AAAI 2025
0
citations

ExpeL: LLM Agents Are Experiential Learners

AAAI 2024
0
citations

Exploring Temporal Feature Correlation for Efficient and Stable Video Semantic Segmentation

AAAI 2024
0
citations

Mask Grounding for Referring Image Segmentation

CVPR 2024
0
citations

Asymmetric Valleys: Beyond Sharp and Flat Local Minima

NeurIPS 2019
0
citations

Implicit Semantic Data Augmentation for Deep Networks

NeurIPS 2019
0
citations

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

NeurIPS 2019
0
citations

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

NeurIPS 2020
0
citations

Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

NeurIPS 2021
0
citations

Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition

NeurIPS 2021
0
citations

Searching Parameterized AP Loss for Object Detection

NeurIPS 2021
0
citations

Efficient Knowledge Distillation from Model Checkpoints

NeurIPS 2022
0
citations

Provable General Function Class Representation Learning in Multitask Bandits and MDP

NeurIPS 2022
0
citations

Contrastive Language-Image Pre-Training with Knowledge Graphs

NeurIPS 2022
0
citations

A Mixture Of Surprises for Unsupervised Reinforcement Learning

NeurIPS 2022
0
citations

Latency-aware Spatial-wise Dynamic Networks

NeurIPS 2022
0
citations

Rank-DETR for High Quality Object Detection

NeurIPS 2023
0
citations

STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

NeurIPS 2023
0
citations

Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

NeurIPS 2023
0
citations

Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL

NeurIPS 2023
0
citations