Hao Li
48
Papers
443
Total Citations
Papers (48)
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
ICLR 2024
118
citations
GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing
NeurIPS 2025
60
citations
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
CVPR 2025
34
citations
Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
AAAI 2024arXiv
33
citations
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
CVPR 2024
32
citations
VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment
CVPR 2024
29
citations
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
CVPR 2024
28
citations
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
ICCV 2025
24
citations
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
NeurIPS 2025
17
citations
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
CVPR 2025arXiv
15
citations
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
ECCV 2024
13
citations
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
AAAI 2025
10
citations
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
CVPR 2025
5
citations
VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence
AAAI 2025
4
citations
GIFStream: 4D Gaussian-based Immersive Video with Feature Stream
CVPR 2025
4
citations
Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics
AAAI 2025
4
citations
DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
ICCV 2025arXiv
4
citations
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
CVPR 2025
3
citations
Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models
AAAI 2025
2
citations
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
CVPR 2025
2
citations
STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization
NeurIPS 2025
1
citations
TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction
ICML 2025
1
citations
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
CVPR 2024
0
citations
Diffusion-based Blind Text Image Super-Resolution
CVPR 2024
0
citations
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
ICML 2024
0
citations
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
CVPR 2025
0
citations
PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques
ICML 2024
0
citations
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
CVPR 2025
0
citations
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
CVPR 2025
0
citations
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
CVPR 2025
0
citations
FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling
ICCV 2025
0
citations
FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
ICCV 2025
0
citations
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
ICCV 2025
0
citations
LangBridge: Interpreting Image as a Combination of Language Embeddings
ICCV 2025
0
citations
CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
ICCV 2025
0
citations
Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID
ICCV 2025
0
citations
QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation
ICCV 2025
0
citations
AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation
ICCV 2025
0
citations
Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation
AAAI 2025
0
citations
MUCD: Unsupervised Point Cloud Change Detection via Masked Consistency
AAAI 2025
0
citations
HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models
AAAI 2025
0
citations
Partial Point Cloud Registration with Multi-view 2D Image Learning
AAAI 2025
0
citations
AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors
AAAI 2025
0
citations
Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing
AAAI 2024
0
citations
Robustly Train Normalizing Flows via KL Divergence Regularization
AAAI 2024
0
citations
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
CVPR 2024
0
citations
On the Scalability of Diffusion-based Text-to-Image Generation
CVPR 2024
0
citations
NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation
CVPR 2024
0
citations