Kai Han

27
Papers
101
Total Citations

Papers (27)

PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery

ECCV 2024
18
citations

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

ICLR 2025
18
citations

Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts

AAAI 2025
13
citations

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

CVPR 2025
12
citations

Data-efficient Large Vision Models through Sequential Autoregression

ICML 2024
12
citations

Hyperbolic Category Discovery

CVPR 2025
7
citations

Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models

ECCV 2024
6
citations

SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs

ICML 2025
4
citations

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

ICCV 2025
3
citations

Parallel Sequence Modeling via Generalized Spatial Propagation Network

CVPR 2025arXiv
3
citations

v-CLR: View-Consistent Learning for Open-World Instance Segmentation

CVPR 2025arXiv
2
citations

VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models

NeurIPS 2025
1
citations

SEAL: Semantic-Aware Hierarchical Learning for Generalized Category Discovery

NeurIPS 2025
1
citations

LLM Data Selection and Utilization via Dynamic Bi-level Optimization

ICML 2025
1
citations

Rethinking Optimization and Architecture for Tiny Language Models

ICML 2024
0
citations

Detecting Open World Objects via Partial Attribute Assignment

CVPR 2025
0
citations

Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping

ICCV 2025
0
citations

L-Man: A Large Multi-modal Model Unifying Human-centric Tasks

AAAI 2025
0
citations

Deletion-Robust Submodular Maximization with Knapsack Constraints

AAAI 2024
0
citations

SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

CVPR 2024
0
citations

IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

CVPR 2024
0
citations

An Empirical Study of Scaling Law for Scene Text Recognition

CVPR 2024
0
citations

DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models

CVPR 2024
0
citations

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

CVPR 2024
0
citations

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

ICML 2024
0
citations

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

ICML 2024
0
citations

ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models

CVPR 2025
0
citations