Chen Chen

56
Papers
835
Total Citations

Papers (56)

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

ECCV 2024arXiv
146
citations

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

AAAI 2024arXiv
92
citations

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

ICLR 2024arXiv
83
citations

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

NeurIPS 2025arXiv
81
citations

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

ICLR 2024arXiv
45
citations

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

CVPR 2024
44
citations

BAMM: Bidirectional Autoregressive Motion Model

ECCV 2024arXiv
41
citations

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

ICLR 2024arXiv
36
citations

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

ICLR 2024arXiv
32
citations

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

ICLR 2025arXiv
27
citations

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

CVPR 2024arXiv
26
citations

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

AAAI 2025arXiv
22
citations

Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement

AAAI 2024arXiv
21
citations

GCNext: Towards the Unity of Graph Convolutions for Human Motion Prediction

AAAI 2024arXiv
21
citations

STIV: Scalable Text and Image Conditioned Video Generation

ICCV 2025
20
citations

FedMef: Towards Memory-efficient Federated Dynamic Pruning

CVPR 2024arXiv
18
citations

A Simple Background Augmentation Method for Object Detection with Diffusion Model

ECCV 2024arXiv
15
citations

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

ICCV 2025
12
citations

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

ICCV 2025arXiv
12
citations

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

CVPR 2025arXiv
6
citations

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

CVPR 2025arXiv
6
citations

Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

AAAI 2025arXiv
5
citations

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers

CVPR 2025arXiv
5
citations

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

ICCV 2025arXiv
4
citations

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

ICCV 2025arXiv
3
citations

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

AAAI 2025arXiv
3
citations

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

ICCV 2025arXiv
3
citations

SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing

AAAI 2025arXiv
3
citations

Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues

ICCV 2025arXiv
1
citations

BrainMAP: Learning Multiple Activation Pathways in Brain Networks

AAAI 2025arXiv
1
citations

Out-of-Distribution Generalization on Graphs via Progressive Inference

AAAI 2025arXiv
1
citations

TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

NeurIPS 2025arXiv
0
citations

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

CVPR 2025
0
citations

UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification

CVPR 2025
0
citations

Argus: A Compact and Versatile Foundation Model for Vision

CVPR 2025
0
citations

Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition

ICCV 2025
0
citations

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

ICCV 2025arXiv
0
citations

MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge

ICCV 2025
0
citations

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

ICCV 2025arXiv
0
citations

Dive into Aerial Remote Sensing Underwater Depth Estimation with Hyperspectral Imagery

AAAI 2025
0
citations

GenHMR: Generative Human Mesh Recovery

AAAI 2025arXiv
0
citations

From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization

AAAI 2025
0
citations

Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection

CVPR 2024
0
citations

Towards Improved Proxy-Based Deep Metric Learning via Data-Augmented Domain Adaptation

AAAI 2024arXiv
0
citations

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

CVPR 2024arXiv
0
citations

COALA: A Practical and Vision-Centric Federated Learning Platform

ICML 2024
0
citations

MMM: Generative Masked Motion Model

CVPR 2024
0
citations

OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

CVPR 2024
0
citations

Towards Memorization-Free Diffusion Models

CVPR 2024
0
citations

Certified Causal Defense with Generalizable Robustness

AAAI 2025
0
citations

Virtual Nodes Can Help: Tackling Distribution Shifts in Federated Graph Learning

AAAI 2025
0
citations

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

CVPR 2024
0
citations

ST-FiT: Inductive Spatial-Temporal Forecasting with Limited Training Data

AAAI 2025
0
citations

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

AAAI 2025
0
citations

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

ICML 2024
0
citations

Decouple Content and Motion for Conditional Image-to-Video Generation

AAAI 2024
0
citations