Gao Huang

23

Papers

228

Total Citations

Papers (23)

GSVA: Generalized Segmentation via Multimodal Large Language Models

Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

Video Perception Models for 3D Scene Synthesis

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding

GridMix: Exploring Spatial Modulation for Neural Fields in PDE Modeling

DTOS: Dynamic Time Object Sensing with Large Multimodal Model

IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance

Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding

SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models

CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning

CODA: Repurposing Continuous VAEs for Discrete Tokenization

DiveR-CT: Diversity-enhanced Red Teaming Large Language Model Assistants with Relaxing Constraints

ExpeL: LLM Agents Are Experiential Learners

Exploring Temporal Feature Correlation for Efficient and Stable Video Semantic Segmentation

Mask Grounding for Referring Image Segmentation