Chen Chen
56
Papers
835
Total Citations
Papers (56)
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
ECCV 2024arXiv
146
citations
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
AAAI 2024arXiv
92
citations
Detecting, Explaining, and Mitigating Memorization in Diffusion Models
ICLR 2024arXiv
83
citations
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
NeurIPS 2025arXiv
81
citations
SEPT: Towards Efficient Scene Representation Learning for Motion Prediction
ICLR 2024arXiv
45
citations
Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges
CVPR 2024
44
citations
BAMM: Bidirectional Autoregressive Motion Model
ECCV 2024arXiv
41
citations
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
ICLR 2024arXiv
36
citations
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
ICLR 2024arXiv
32
citations
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
ICLR 2025arXiv
27
citations
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
CVPR 2024arXiv
26
citations
GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models
AAAI 2025arXiv
22
citations
Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement
AAAI 2024arXiv
21
citations
GCNext: Towards the Unity of Graph Convolutions for Human Motion Prediction
AAAI 2024arXiv
21
citations
STIV: Scalable Text and Image Conditioned Video Generation
ICCV 2025
20
citations
FedMef: Towards Memory-efficient Federated Dynamic Pruning
CVPR 2024arXiv
18
citations
A Simple Background Augmentation Method for Object Detection with Diffusion Model
ECCV 2024arXiv
15
citations
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
ICCV 2025
12
citations
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
ICCV 2025arXiv
12
citations
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
CVPR 2025arXiv
6
citations
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
CVPR 2025arXiv
6
citations
Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective
AAAI 2025arXiv
5
citations
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
CVPR 2025arXiv
5
citations
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
ICCV 2025arXiv
4
citations
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
ICCV 2025arXiv
3
citations
Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning
AAAI 2025arXiv
3
citations
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
ICCV 2025arXiv
3
citations
SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing
AAAI 2025arXiv
3
citations
Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues
ICCV 2025arXiv
1
citations
BrainMAP: Learning Multiple Activation Pathways in Brain Networks
AAAI 2025arXiv
1
citations
Out-of-Distribution Generalization on Graphs via Progressive Inference
AAAI 2025arXiv
1
citations
TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE
NeurIPS 2025arXiv
0
citations
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
CVPR 2025
0
citations
UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification
CVPR 2025
0
citations
Argus: A Compact and Versatile Foundation Model for Vision
CVPR 2025
0
citations
Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition
ICCV 2025
0
citations
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
ICCV 2025arXiv
0
citations
MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge
ICCV 2025
0
citations
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
ICCV 2025arXiv
0
citations
Dive into Aerial Remote Sensing Underwater Depth Estimation with Hyperspectral Imagery
AAAI 2025
0
citations
GenHMR: Generative Human Mesh Recovery
AAAI 2025arXiv
0
citations
From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization
AAAI 2025
0
citations
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
CVPR 2024
0
citations
Towards Improved Proxy-Based Deep Metric Learning via Data-Augmented Domain Adaptation
AAAI 2024arXiv
0
citations
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
CVPR 2024arXiv
0
citations
COALA: A Practical and Vision-Centric Federated Learning Platform
ICML 2024
0
citations
MMM: Generative Masked Motion Model
CVPR 2024
0
citations
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
CVPR 2024
0
citations
Towards Memorization-Free Diffusion Models
CVPR 2024
0
citations
Certified Causal Defense with Generalizable Robustness
AAAI 2025
0
citations
Virtual Nodes Can Help: Tackling Distribution Shifts in Federated Graph Learning
AAAI 2025
0
citations
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
CVPR 2024
0
citations
ST-FiT: Inductive Spatial-Temporal Forecasting with Limited Training Data
AAAI 2025
0
citations
SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation
AAAI 2025
0
citations
How to Trace Latent Generative Model Generated Images without Artificial Watermark?
ICML 2024
0
citations
Decouple Content and Motion for Conditional Image-to-Video Generation
AAAI 2024
0
citations