Gao Huang
74
Papers
228
Total Citations
Papers (74)
GSVA: Generalized Segmentation via Multimodal Large Language Models
CVPR 2024
127
citations
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
CVPR 2024
28
citations
DyFADet: Dynamic Feature Aggregation for Temporal Action Detection
ECCV 2024
21
citations
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
CVPR 2025
20
citations
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
ECCV 2024
15
citations
Video Perception Models for 3D Scene Synthesis
NeurIPS 2025
5
citations
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
CVPR 2025
5
citations
GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling
ICLR 2025
4
citations
DTOS: Dynamic Time Object Sensing with Large Multimodal Model
CVPR 2025
2
citations
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
ICCV 2025arXiv
1
citations
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
CVPR 2024
0
citations
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
CVPR 2024
0
citations
SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
ICML 2024
0
citations
Densely Connected Convolutional Networks
CVPR 2017arXiv
0
citations
CondenseNet: An Efficient DenseNet Using Learned Group Convolutions
CVPR 2018arXiv
0
citations
Resource Aware Person Re-Identification Across Multiple Resolutions
CVPR 2018arXiv
0
citations
Resolution Adaptive Networks for Efficient Inference
CVPR 2020arXiv
0
citations
CondenseNet V2: Sparse Feature Reactivation for Deep Networks
CVPR 2021arXiv
0
citations
Cross-Iteration Batch Normalization
CVPR 2021arXiv
0
citations
3D Object Detection With Pointformer
CVPR 2021arXiv
0
citations
Vision Transformer With Deformable Attention
CVPR 2022arXiv
0
citations
DiSparse: Disentangled Sparsification for Multitask Model Compression
CVPR 2022
0
citations
On the Integration of Self-Attention and Convolution
CVPR 2022arXiv
0
citations
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
CVPR 2022
0
citations
AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks
CVPR 2022
0
citations
Exploring the Equivalence of Siamese Self-Supervised Learning via a Unified Gradient Framework
CVPR 2022arXiv
0
citations
AdaFocus V2: End-to-End Training of Spatial Dynamic Networks for Video Recognition
CVPR 2022arXiv
0
citations
Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
CVPR 2023arXiv
0
citations
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
CVPR 2023
0
citations
Zero-Shot Generative Model Adaptation via Image-Specific Prompt Learning
CVPR 2023arXiv
0
citations
Siamese Image Modeling for Self-Supervised Vision Representation Learning
CVPR 2023arXiv
0
citations
Slide-Transformer: Hierarchical Vision Transformer With Local Self-Attention
CVPR 2023
0
citations
Learning Efficient Convolutional Networks Through Network Slimming
ICCV 2017arXiv
0
citations
Improved Techniques for Training Adaptive Deep Networks
ICCV 2019
0
citations
Adaptive Focus for Efficient Video Recognition
ICCV 2021arXiv
0
citations
Towards Learning Spatially Discriminative Feature Representations
ICCV 2021arXiv
0
citations
Frequency Domain Image Translation: More Photo-Realistic, Better Identity-Preserving
ICCV 2021arXiv
0
citations
FLatten Transformer: Vision Transformer using Focused Linear Attention
ICCV 2023arXiv
0
citations
Dynamic Perceiver for Efficient Visual Recognition
ICCV 2023arXiv
0
citations
Adaptive Rotated Convolution for Rotated Object Detection
ICCV 2023arXiv
0
citations
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
ICCV 2023arXiv
0
citations
Deep Incubation: Training Large Models by Divide-and-Conquering
ICCV 2023arXiv
0
citations
Borrowing Knowledge From Pre-trained Language Model: A New Data-efficient Visual Learning Paradigm
ICCV 2023
0
citations
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation
ECCV 2020
0
citations
AdaFocusV3: On Unified Spatial-Temporal Dynamic Video Recognition
ECCV 2022
0
citations
Learning to Weight Samples for Dynamic Early-Exiting Networks
ECCV 2022
0
citations
ActiveNeRF: Learning Where to See with Uncertainty Estimation
ECCV 2022
0
citations
Supervised Word Mover's Distance
NeurIPS 2016
0
citations
CODA: Repurposing Continuous VAEs for Discrete Tokenization
ICCV 2025
0
citations
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
CVPR 2025arXiv
0
citations
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
CVPR 2025
0
citations
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
CVPR 2025
0
citations
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
CVPR 2025
0
citations
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
CVPR 2025
0
citations
DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints
AAAI 2025
0
citations
ExpeL: LLM Agents Are Experiential Learners
AAAI 2024
0
citations
Exploring Temporal Feature Correlation for Efficient and Stable Video Semantic Segmentation
AAAI 2024
0
citations
Mask Grounding for Referring Image Segmentation
CVPR 2024
0
citations
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
NeurIPS 2019
0
citations
Implicit Semantic Data Augmentation for Deep Networks
NeurIPS 2019
0
citations
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning
NeurIPS 2019
0
citations
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
NeurIPS 2020
0
citations
Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning
NeurIPS 2021
0
citations
Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition
NeurIPS 2021
0
citations
Searching Parameterized AP Loss for Object Detection
NeurIPS 2021
0
citations
Efficient Knowledge Distillation from Model Checkpoints
NeurIPS 2022
0
citations
Provable General Function Class Representation Learning in Multitask Bandits and MDP
NeurIPS 2022
0
citations
Contrastive Language-Image Pre-Training with Knowledge Graphs
NeurIPS 2022
0
citations
A Mixture Of Surprises for Unsupervised Reinforcement Learning
NeurIPS 2022
0
citations
Latency-aware Spatial-wise Dynamic Networks
NeurIPS 2022
0
citations
Rank-DETR for High Quality Object Detection
NeurIPS 2023
0
citations
STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning
NeurIPS 2023
0
citations
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
NeurIPS 2023
0
citations
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
NeurIPS 2023
0
citations