Kai Chen

37

Papers

586

Total Citations

Papers (37)

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

OMG-Seg: Is One Model Good Enough For All Segmentation?

RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation

MagicDrive-V2: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement

Implicit Concept Removal of Diffusion Models

Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

Rethinking Verification for LLM Code Generation: From Generation to Testing

RepeatLeakage: Leak Prompts from Repeating as Large Language Model Is a Good Repeater

Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models

NeurIPS 2025arXiv

MotionShot: Adaptive Motion Transfer across Arbitrary Objects for Text-to-Video Generation

PatchScaler: An Efficient Patch-Independent Diffusion Model for Image Super-Resolution

Contact Map Transfer with Conditional Diffusion Model for Generalizable Dexterous Grasp Generation

SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction

Parallel Beam Search Algorithms for Domain-Independent Dynamic Programming

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text

Information Density Principle for MLLM Benchmarks

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models

Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation

Differentiable Model Scaling using Differentiable Topk

Mixing Expert Knowledge: Bring Human Thoughts Back To the Game of Go

Can AI Assistants Know What They Don't Know?

DocVision: a Seamless, Cross-Device Immersive Active Reading Framework for Digital Academic Literature

Social Recommendation via Graph-Level Counterfactual Augmentation

Semantic-guided Masked Mutual Learning for Multi-modal Brain Tumor Segmentation with Arbitrary Missing Modalities

LLM-DR: A Novel LLM-Aided Diffusion Model for Rule Generation on Temporal Knowledge Graphs

Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning