Gao Huang

23
Papers
228
Total Citations

Papers (23)

GSVA: Generalized Segmentation via Multimodal Large Language Models

CVPR 2024
127
citations

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

CVPR 2024
28
citations

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

ECCV 2024
21
citations

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

CVPR 2025
20
citations

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

ECCV 2024
15
citations

Video Perception Models for 3D Scene Synthesis

NeurIPS 2025
5
citations

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

CVPR 2025
5
citations

GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling

ICLR 2025
4
citations

DTOS: Dynamic Time Object Sensing with Large Multimodal Model

CVPR 2025
2
citations

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

ICCV 2025arXiv
1
citations

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

CVPR 2024
0
citations

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

CVPR 2024
0
citations

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

CVPR 2025
0
citations

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

ICML 2024
0
citations

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

CVPR 2025arXiv
0
citations

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

CVPR 2025
0
citations

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

CVPR 2025
0
citations

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

CVPR 2025
0
citations

CODA: Repurposing Continuous VAEs for Discrete Tokenization

ICCV 2025
0
citations

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints

AAAI 2025
0
citations

ExpeL: LLM Agents Are Experiential Learners

AAAI 2024
0
citations

Exploring Temporal Feature Correlation for Efficient and Stable Video Semantic Segmentation

AAAI 2024
0
citations

Mask Grounding for Referring Image Segmentation

CVPR 2024
0
citations