Chen Chen

56

Papers

835

Total Citations

Papers (56)

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

NeurIPS 2025arXiv

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

BAMM: Bidirectional Autoregressive Motion Model

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement

GCNext: Towards the Unity of Graph Convolutions for Human Motion Prediction

STIV: Scalable Text and Image Conditioned Video Generation

FedMef: Towards Memory-efficient Federated Dynamic Pruning

A Simple Background Augmentation Method for Object Detection with Diffusion Model

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing

Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues

BrainMAP: Learning Multiple Activation Pathways in Brain Networks

Out-of-Distribution Generalization on Graphs via Progressive Inference

TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

NeurIPS 2025arXiv

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification

Argus: A Compact and Versatile Foundation Model for Vision

Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

Dive into Aerial Remote Sensing Underwater Depth Estimation with Hyperspectral Imagery

GenHMR: Generative Human Mesh Recovery

From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization

Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection

Towards Improved Proxy-Based Deep Metric Learning via Data-Augmented Domain Adaptation

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

COALA: A Practical and Vision-Centric Federated Learning Platform

MMM: Generative Masked Motion Model

OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Towards Memorization-Free Diffusion Models

Certified Causal Defense with Generalizable Robustness

Virtual Nodes Can Help: Tackling Distribution Shifts in Federated Graph Learning

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

ST-FiT: Inductive Spatial-Temporal Forecasting with Limited Training Data

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

Decouple Content and Motion for Conditional Image-to-Video Generation