Hao Li

48
Papers
446
Total Citations

Papers (48)

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

ICLR 2024
118
citations

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

NeurIPS 2025
60
citations

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

CVPR 2025
34
citations

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

AAAI 2024arXiv
33
citations

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

CVPR 2024
32
citations

VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment

CVPR 2024
29
citations

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

CVPR 2024
28
citations

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

ICCV 2025
24
citations

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

NeurIPS 2025
17
citations

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

CVPR 2025arXiv
15
citations

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

ECCV 2024
13
citations

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

AAAI 2025
10
citations

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

CVPR 2025
5
citations

VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence

AAAI 2025
4
citations

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

ICCV 2025arXiv
4
citations

Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics

AAAI 2025
4
citations

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

CVPR 2025
4
citations

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding

CVPR 2025
3
citations

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

ICCV 2025arXiv
3
citations

Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation

CVPR 2025arXiv
2
citations

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

AAAI 2025
2
citations

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

NeurIPS 2025arXiv
1
citations

TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction

ICML 2025
1
citations

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

ICML 2024
0
citations

Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

CVPR 2025
0
citations

PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques

ICML 2024
0
citations

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

CVPR 2025
0
citations

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

CVPR 2025
0
citations

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

CVPR 2025
0
citations

FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling

ICCV 2025
0
citations

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

ICCV 2025
0
citations

LangBridge: Interpreting Image as a Combination of Language Embeddings

ICCV 2025
0
citations

CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction

ICCV 2025
0
citations

Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID

ICCV 2025
0
citations

QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation

ICCV 2025
0
citations

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation

ICCV 2025
0
citations

Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation

AAAI 2025
0
citations

MUCD: Unsupervised Point Cloud Change Detection via Masked Consistency

AAAI 2025
0
citations

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

AAAI 2025
0
citations

Partial Point Cloud Registration with Multi-view 2D Image Learning

AAAI 2025
0
citations

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors

AAAI 2025
0
citations

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

AAAI 2024
0
citations

Robustly Train Normalizing Flows via KL Divergence Regularization

AAAI 2024
0
citations

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

CVPR 2024
0
citations

On the Scalability of Diffusion-based Text-to-Image Generation

CVPR 2024
0
citations

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

CVPR 2024
0
citations

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

CVPR 2024
0
citations

Diffusion-based Blind Text Image Super-Resolution

CVPR 2024
0
citations