Kai Chen

75
Papers
584
Total Citations

Papers (75)

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

ECCV 2024
152
citations

OMG-Seg: Is One Model Good Enough For All Segmentation?

CVPR 2024
106
citations

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

CVPR 2024
53
citations

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

ICCV 2025
44
citations

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

ICLR 2024
44
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

CVPR 2025
44
citations

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

CVPR 2024
39
citations

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

CVPR 2024
30
citations

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

CVPR 2024
21
citations

Implicit Concept Removal of Diffusion Models

ECCV 2024arXiv
18
citations

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs

ICCV 2025arXiv
12
citations

Rethinking Verification for LLM Code Generation: From Generation to Testing

NeurIPS 2025
7
citations

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

AAAI 2025
7
citations

RepeatLeakage: Leak Prompts from Repeating as Large Language Model Is a Good Repeater

AAAI 2025
2
citations

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

NeurIPS 2025
2
citations

PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution

ICCV 2025
1
citations

Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation

NeurIPS 2025
1
citations

SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction

CVPR 2025
1
citations

Differentiable Model Scaling using Differentiable Topk

ICML 2024
0
citations

Can AI Assistants Know What They Don't Know?

ICML 2024
0
citations

Discover and Learn New Objects From Documentaries

CVPR 2017arXiv
0
citations

Optimizing Video Object Detection via a Scale-Time Lattice

CVPR 2018arXiv
0
citations

Libra R-CNN: Towards Balanced Learning for Object Detection

CVPR 2019
0
citations

Region Proposal by Guided Anchoring

CVPR 2019
0
citations

Hybrid Task Cascade for Instance Segmentation

CVPR 2019
0
citations

Prime Sample Attention in Object Detection

CVPR 2020arXiv
0
citations

Positional Encoding As Spatial Inductive Bias in GANs

CVPR 2021arXiv
0
citations

Seesaw Loss for Long-Tailed Instance Segmentation

CVPR 2021arXiv
0
citations

Learning To Identify Correct 2D-2D Line Correspondences on Sphere

CVPR 2021
0
citations

TransRank: Self-Supervised Video Representation Learning via Ranking-Based Transformation Recognition

CVPR 2022arXiv
0
citations

OCSampler: Compressing Videos to One Clip With Single-Step Sampling

CVPR 2022arXiv
0
citations

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

CVPR 2022
0
citations

Revisiting Skeleton-Based Action Recognition

CVPR 2022arXiv
0
citations

GCFSR: A Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors

CVPR 2022arXiv
0
citations

Group R-CNN for Weakly Semi-Supervised Object Detection With Points

CVPR 2022
0
citations

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

CVPR 2022arXiv
0
citations

Mixed Autoencoder for Self-Supervised Visual Representation Learning

CVPR 2023arXiv
0
citations

RIFormer: Keep Your Vision Backbone Effective but Removing Token Mixer

CVPR 2023
0
citations

Dense Distinct Query for End-to-End Object Detection

CVPR 2023arXiv
0
citations

CARAFE: Content-Aware ReAssembly of FEatures

ICCV 2019
0
citations

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

ICCV 2021
0
citations

MultiSiam: Self-Supervised Multi-Instance Siamese Representation Learning for Autonomous Driving

ICCV 2021arXiv
0
citations

Learning Icosahedral Spherical Probability Map Based on Bingham Mixture Model for Vanishing Point Estimation

ICCV 2021
0
citations

Learning Shape Primitives via Implicit Convexity Regularization

ICCV 2023
0
citations

Robo3D: Towards Robust and Reliable 3D Perception against Corruptions

ICCV 2023arXiv
0
citations

Improving Pixel-based MIM by Reducing Wasted Modeling Capability

ICCV 2023arXiv
0
citations

UMC: A Unified Bandwidth-efficient and Multi-resolution based Collaborative Perception Framework

ICCV 2023arXiv
0
citations

Deep Fusion Transformer Network with Weighted Vector-Wise Keypoints Voting for Robust 6D Object Pose Estimation

ICCV 2023arXiv
0
citations

Side-Aware Boundary Localization for More Precise Object Detection

ECCV 2020
0
citations

Dense Siamese Network for Dense Unsupervised Learning

ECCV 2022
0
citations

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving

ECCV 2022
0
citations

Sim-to-Real 6D Object Pose Estimation via Iterative Self-Training for Robotic Bin Picking

ECCV 2022
0
citations

Consistent-Teacher: Towards Reducing Inconsistent Pseudo-Targets in Semi-Supervised Object Detection

CVPR 2023
0
citations

Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation

CVPR 2025
0
citations

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models

CVPR 2025
0
citations

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

CVPR 2025
0
citations

Information Density Principle for MLLM Benchmarks

ICCV 2025
0
citations

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation

ICCV 2025
0
citations

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

NeurIPS 2025
0
citations

DocVision: a Seamless, Cross-Device Immersive Active Reading Framework for Digital Academic Literature

ISMAR 2025
0
citations

Social Recommendation via Graph-Level Counterfactual Augmentation

AAAI 2025
0
citations

Semantic-guided Masked Mutual Learning for Multi-modal Brain Tumor Segmentation with Arbitrary Missing Modalities

AAAI 2025
0
citations

LLM-DR: A Novel LLM-Aided Diffusion Model for Rule Generation on Temporal Knowledge Graphs

AAAI 2025
0
citations

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning

AAAI 2025
0
citations

Parallel Beam Search Algorithms for Domain-Independent Dynamic Programming

AAAI 2024
0
citations

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis

AAAI 2024
0
citations

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

CVPR 2024
0
citations

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

CVPR 2024
0
citations

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text

CVPR 2024
0
citations

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

CVPR 2024
0
citations

K-Net: Towards Unified Image Segmentation

NeurIPS 2021
0
citations

Few-Shot Object Detection via Association and DIscrimination

NeurIPS 2021
0
citations

Deliberated Domain Bridging for Domain Adaptive Semantic Segmentation

NeurIPS 2022
0
citations

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

NeurIPS 2023
0
citations

GlyphControl: Glyph Conditional Control for Visual Text Generation

NeurIPS 2023
0
citations