Hao Li

125
Papers
654
Total Citations

Papers (125)

Training Quantized Nets: A Deeper Understanding

NeurIPS 2017arXiv
222
citations

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

ICLR 2024
118
citations

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

NeurIPS 2025
60
citations

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

CVPR 2025
34
citations

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

AAAI 2024arXiv
33
citations

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

CVPR 2024
32
citations

VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment

CVPR 2024
29
citations

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

CVPR 2024
28
citations

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

ICCV 2025
17
citations

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

NeurIPS 2025
17
citations

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

CVPR 2025arXiv
15
citations

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

ECCV 2024
13
citations

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

AAAI 2025
10
citations

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

CVPR 2025
5
citations

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

CVPR 2025
4
citations

VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence

AAAI 2025
4
citations

Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics

AAAI 2025
4
citations

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding

CVPR 2025
3
citations

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

AAAI 2025
2
citations

Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation

CVPR 2025
2
citations

TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction

ICML 2025
1
citations

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

NeurIPS 2025
1
citations

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

CVPR 2024
0
citations

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

CVPR 2024
0
citations

Diffusion-based Blind Text Image Super-Resolution

CVPR 2024
0
citations

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

ICML 2024
0
citations

PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques

ICML 2024
0
citations

Unconstrained Realtime Facial Performance Capture

CVPR 2015
0
citations

Dense Human Body Correspondences Using Convolutional Networks

CVPR 2016
0
citations

Photorealistic Facial Texture Inference Using Deep Neural Networks

CVPR 2017arXiv
0
citations

High-Resolution Image Inpainting Using Multi-Scale Neural Patch Synthesis

CVPR 2017arXiv
0
citations

DoubleFusion: Real-Time Capture of Human Performances With Inner Body Shapes From a Single Depth Sensor

CVPR 2018arXiv
0
citations

Mesoscopic Facial Geometry Inference Using Deep Neural Networks

CVPR 2018
0
citations

Large-Scale Distance Metric Learning With Uncertainty

CVPR 2018arXiv
0
citations

SiCloPe: Silhouette-Based Clothed People

CVPR 2019
0
citations

On the Continuity of Rotation Representations in Neural Networks

CVPR 2019
0
citations

ARCH: Animatable Reconstruction of Clothed Humans

CVPR 2020arXiv
0
citations

Learning Formation of Physically-Based Face Attributes

CVPR 2020arXiv
0
citations

Hierarchically Robust Representation Learning

CVPR 2020arXiv
0
citations

Intuitive, Interactive Beard and Hair Synthesis With Generative Models

CVPR 2020arXiv
0
citations

DR Loss: Improving Object Detection by Distributional Ranking

CVPR 2020arXiv
0
citations

Robust Representation Learning With Feedback for Single Image Deraining

CVPR 2021arXiv
0
citations

Equivariant Point Network for 3D Point Cloud Analysis

CVPR 2021arXiv
0
citations

Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

CVPR 2021arXiv
0
citations

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

CVPR 2021
0
citations

SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature

CVPR 2021
0
citations

Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks

CVPR 2022
0
citations

Unsupervised Visual Representation Learning by Online Constrained K-Means

CVPR 2022
0
citations

EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation

CVPR 2022
0
citations

Task Adaptive Parameter Sharing for Multi-Task Learning

CVPR 2022arXiv
0
citations

MogFace: Towards a Deeper Appreciation on Face Detection

CVPR 2022arXiv
0
citations

AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks

CVPR 2022
0
citations

Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion

CVPR 2022arXiv
0
citations

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition

CVPR 2022
0
citations

An Efficient Training Approach for Very Large Scale Face Recognition

CVPR 2022arXiv
0
citations

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

CVPR 2023arXiv
0
citations

StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis

CVPR 2023
0
citations

Learning a Sparse Transformer Network for Effective Image Deraining

CVPR 2023arXiv
0
citations

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

CVPR 2023arXiv
0
citations

The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects

CVPR 2023
0
citations

Guided Recommendation for Model Fine-Tuning

CVPR 2023
0
citations

Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt

CVPR 2023arXiv
0
citations

Learning Dense Facial Correspondences in Unconstrained Images

ICCV 2017arXiv
0
citations

Realistic Dynamic Facial Textures From a Single Image Using GANs

ICCV 2017
0
citations

Improved Techniques for Training Adaptive Deep Networks

ICCV 2019
0
citations

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

ICCV 2019
0
citations

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

ICCV 2019
0
citations

Transformable Bottleneck Networks

ICCV 2019
0
citations

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

ICCV 2019
0
citations

Learning Perspective Undistortion of Portraits

ICCV 2019
0
citations

Learning to Rank Proposals for Object Detection

ICCV 2019
0
citations

PlenOctrees for Real-Time Rendering of Neural Radiance Fields

ICCV 2021arXiv
0
citations

TransReID: Transformer-Based Object Re-Identification

ICCV 2021arXiv
0
citations

Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

CVPR 2025
0
citations

Digging Into Uncertainty in Self-Supervised Multi-View Stereo

ICCV 2021arXiv
0
citations

Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition

ICCV 2021
0
citations

DisUnknown: Distilling Unknown Factors for Disentanglement Learning

ICCV 2021arXiv
0
citations

A Simple Baseline for Semi-Supervised Semantic Segmentation With Strong Data Augmentation

ICCV 2021arXiv
0
citations

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

ICCV 2021arXiv
0
citations

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

ICCV 2023arXiv
0
citations

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

ICCV 2023
0
citations

XMem++: Production-level Video Segmentation From Few Annotated Frames

ICCV 2023
0
citations

Video Action Recognition with Attentive Semantic Units

ICCV 2023arXiv
0
citations

Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds

ICCV 2023
0
citations

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

ICCV 2023arXiv
0
citations

Monocular Real-Time Volumetric Performance Capture

ECCV 2020
0
citations

Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection

ECCV 2022
0
citations

Unstructured Feature Decoupling for Vehicle Re-identification

ECCV 2022
0
citations

DLME: Deep Local-Flatness Manifold Embedding

ECCV 2022
0
citations

KVT: k-NN Attention for Boosting Vision Transformers

ECCV 2022
0
citations

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

ECCV 2022
0
citations

Weakly Supervised Representation Learning With Coarse Labels

ICCV 2021arXiv
0
citations

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

CVPR 2025
0
citations

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

CVPR 2025
0
citations

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

CVPR 2025
0
citations

FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling

ICCV 2025
0
citations

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

ICCV 2025
0
citations

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

ICCV 2025
0
citations

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

ICCV 2025
0
citations

LangBridge: Interpreting Image as a Combination of Language Embeddings

ICCV 2025
0
citations

CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction

ICCV 2025
0
citations

Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID

ICCV 2025
0
citations

QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation

ICCV 2025
0
citations

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation

ICCV 2025
0
citations

Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation

AAAI 2025
0
citations

MUCD: Unsupervised Point Cloud Change Detection via Masked Consistency

AAAI 2025
0
citations

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

AAAI 2025
0
citations

Partial Point Cloud Registration with Multi-view 2D Image Learning

AAAI 2025
0
citations

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors

AAAI 2025
0
citations

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

AAAI 2024
0
citations

Robustly Train Normalizing Flows via KL Divergence Regularization

AAAI 2024
0
citations

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

CVPR 2024
0
citations

On the Scalability of Diffusion-based Text-to-Image Generation

CVPR 2024
0
citations

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

NeurIPS 2018
0
citations

Visualizing the Loss Landscape of Neural Nets

NeurIPS 2018
0
citations

Learning to Infer Implicit Surfaces without 3D Supervision

NeurIPS 2019
0
citations

Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels

NeurIPS 2020
0
citations

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

NeurIPS 2021
0
citations

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

NeurIPS 2022
0
citations

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

NeurIPS 2022
0
citations

Entropy-Driven Mixed-Precision Quantization for Deep Network Design

NeurIPS 2022
0
citations

Improved Fine-Tuning by Better Leveraging Pre-Training Data

NeurIPS 2022
0
citations

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

NeurIPS 2023
0
citations

JourneyDB: A Benchmark for Generative Image Understanding

NeurIPS 2023
0
citations

Adaptive Consensus ADMM for Distributed Optimization

ICML 2017
0
citations