Hao Li

125

Papers

654

Total Citations

Papers (125)

Training Quantized Nets: A Deeper Understanding

NeurIPS 2017arXiv

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

GoT: Unleashing Reasoning Capability of MLLM for Visual Generation and Editing

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment

GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors

Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation

GRPose: Learning Graph Relations for Human Image Generation with Pose Priors

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

GIFStream: 4D Gaussian-based Immersive Video with Feature Stream

VEGAS: Towards Visually Explainable and Grounded Artificial Social Intelligence

Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics

Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding

Political Actor Agent: Simulating Legislative System for Roll Call Votes Prediction with Large Language Models

Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation

TMetaNet: Topological Meta-Learning Framework for Dynamic Link Prediction

STRIDER: Navigation via Instruction-Aligned Structural Decision Space Optimization

NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation

LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

Diffusion-based Blind Text Image Super-Resolution

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques

Unconstrained Realtime Facial Performance Capture

Dense Human Body Correspondences Using Convolutional Networks

Photorealistic Facial Texture Inference Using Deep Neural Networks

High-Resolution Image Inpainting Using Multi-Scale Neural Patch Synthesis

DoubleFusion: Real-Time Capture of Human Performances With Inner Body Shapes From a Single Depth Sensor

Mesoscopic Facial Geometry Inference Using Deep Neural Networks

Large-Scale Distance Metric Learning With Uncertainty

SiCloPe: Silhouette-Based Clothed People

On the Continuity of Rotation Representations in Neural Networks

ARCH: Animatable Reconstruction of Clothed Humans

Learning Formation of Physically-Based Face Attributes

Hierarchically Robust Representation Learning

Intuitive, Interactive Beard and Hair Synthesis With Generative Models

DR Loss: Improving Object Detection by Distributional Ranking

Robust Representation Learning With Feedback for Single Image Deraining

Equivariant Point Network for 3D Point Cloud Analysis

Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

SKFAC: Training Neural Networks With Faster Kronecker-Factored Approximate Curvature

Uni-Perceiver: Pre-Training Unified Architecture for Generic Perception for Zero-Shot and Few-Shot Tasks

Unsupervised Visual Representation Learning by Online Constrained K-Means

EPro-PnP: Generalized End-to-End Probabilistic Perspective-N-Points for Monocular Object Pose Estimation

Task Adaptive Parameter Sharing for Multi-Task Learning

MogFace: Towards a Deeper Appreciation on Face Detection

AutoLoss-Zero: Searching Loss Functions From Scratch for Generic Tasks

Learning To Listen: Modeling Non-Deterministic Dyadic Facial Motion

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-Based Motion Recognition

An Efficient Training Approach for Very Large Scale Face Recognition

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

StyleGene: Crossover and Mutation of Region-Level Facial Genes for Kinship Face Synthesis

Learning a Sparse Transformer Network for Effective Image Deraining

SteerNeRF: Accelerating NeRF Rendering via Smooth Viewpoint Trajectory

The ObjectFolder Benchmark: Multisensory Learning With Neural and Real Objects

Guided Recommendation for Model Fine-Tuning

Boosting Low-Data Instance Segmentation by Unsupervised Pre-Training With Saliency Prompt

Learning Dense Facial Correspondences in Unconstrained Images

Realistic Dynamic Facial Textures From a Single Image Using GANs

Improved Techniques for Training Adaptive Deep Networks

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

SoftTriple Loss: Deep Metric Learning Without Triplet Sampling

Transformable Bottleneck Networks

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

Learning Perspective Undistortion of Portraits

Learning to Rank Proposals for Object Detection

PlenOctrees for Real-Time Rendering of Neural Radiance Fields

TransReID: Transformer-Based Object Re-Identification

Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World

Digging Into Uncertainty in Self-Supervised Multi-View Stereo

Zen-NAS: A Zero-Shot NAS for High-Performance Image Recognition

DisUnknown: Distilling Unknown Factors for Disentanglement Learning

A Simple Baseline for Semi-Supervised Semantic Segmentation With Strong Data Augmentation

Topologically Consistent Multi-View Face Inference Using Volumetric Sampling

MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection

NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space

XMem++: Production-level Video Segmentation From Few Annotated Frames

Video Action Recognition with Attentive Semantic Units

Clusterformer: Cluster-based Transformer for 3D Object Detection in Point Clouds

DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Monocular Real-Time Volumetric Performance Capture

Enhancing Multi-modal Features Using Local Self-Attention for 3D Object Detection

Unstructured Feature Decoupling for Vehicle Re-identification

DLME: Deep Local-Flatness Manifold Embedding

KVT: k-NN Attention for Boosting Vision Transformers

TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation

Weakly Supervised Representation Learning With Coarse Labels

DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis

CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

FuXi-RTM: A Physics-Guided Prediction Framework with Radiative Transfer Modeling

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

LangBridge: Interpreting Image as a Combination of Language Embeddings

CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction

Cross-Category Subjectivity Generalization for Style-Adaptive Sketch Re-ID

QR-LoRA: Efficient and Disentangled Fine-tuning via QR Decomposition for Customized Generation

AU-Blendshape for Fine-grained Stylized 3D Facial Expression Manipulation

Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation

MUCD: Unsupervised Point Cloud Change Detection via Masked Consistency

HieraFashDiff: Hierarchical Fashion Design with Multi-stage Diffusion Models

Partial Point Cloud Registration with Multi-view 2D Image Learning

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors

Gradual Residuals Alignment: A Dual-Stream Framework for GAN Inversion and Image Attribute Editing

Robustly Train Normalizing Flows via KL Divergence Regularization

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft

On the Scalability of Diffusion-based Text-to-Image Generation

MacNet: Transferring Knowledge from Machine Comprehension to Sequence-to-Sequence Models

Visualizing the Loss Landscape of Neural Nets

Learning to Infer Implicit Surfaces without 3D Supervision

Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

A Differentiable Semantic Metric Approximation in Probabilistic Embedding for Cross-Modal Retrieval

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

Entropy-Driven Mixed-Precision Quantization for Deep Network Design

Improved Fine-Tuning by Better Leveraging Pre-Training Data

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

JourneyDB: A Benchmark for Generative Image Understanding

Adaptive Consensus ADMM for Distributed Optimization