Chen Chen

113
Papers
835
Total Citations

Papers (113)

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

ECCV 2024
146
citations

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

AAAI 2024arXiv
92
citations

Detecting, Explaining, and Mitigating Memorization in Diffusion Models

ICLR 2024
83
citations

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

NeurIPS 2025
81
citations

SEPT: Towards Efficient Scene Representation Learning for Motion Prediction

ICLR 2024
45
citations

Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges

CVPR 2024
44
citations

BAMM: Bidirectional Autoregressive Motion Model

ECCV 2024
41
citations

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

ICLR 2024
36
citations

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

ICLR 2024
32
citations

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling

ICLR 2025
27
citations

Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

CVPR 2024
26
citations

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

AAAI 2025
22
citations

GCNext: Towards the Unity of Graph Convolutions for Human Motion Prediction

AAAI 2024arXiv
21
citations

Adaptive FSS: A Novel Few-Shot Segmentation Framework via Prototype Enhancement

AAAI 2024arXiv
21
citations

STIV: Scalable Text and Image Conditioned Video Generation

ICCV 2025
20
citations

FedMef: Towards Memory-efficient Federated Dynamic Pruning

CVPR 2024
18
citations

A Simple Background Augmentation Method for Object Detection with Diffusion Model

ECCV 2024
15
citations

MaskControl: Spatio-Temporal Control for Masked Motion Synthesis

ICCV 2025
12
citations

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

ICCV 2025
12
citations

Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models

CVPR 2025
6
citations

Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

CVPR 2025
6
citations

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers

CVPR 2025
5
citations

Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

AAAI 2025
5
citations

SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality

ICCV 2025
4
citations

Exploit Gradient Skewness to Circumvent Byzantine Defenses for Federated Learning

AAAI 2025
3
citations

SemStereo: Semantic-Constrained Stereo Matching Network for Remote Sensing

AAAI 2025
3
citations

TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

ICCV 2025
3
citations

SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World

ICCV 2025
3
citations

BrainMAP: Learning Multiple Activation Pathways in Brain Networks

AAAI 2025
1
citations

Fusion Meets Diverse Conditions: A High-diversity Benchmark and Baseline for UAV-based Multimodal Object Detection with Condition Cues

ICCV 2025
1
citations

Out-of-Distribution Generalization on Graphs via Progressive Inference

AAAI 2025
1
citations

Real-World Anomaly Detection in Surveillance Videos

CVPR 2018arXiv
0
citations

Boosting Local Shape Matching for Dense 3D Face Correspondence

CVPR 2019
0
citations

Attention Mechanism Exploits Temporal Contexts: Real-Time 3D Human Pose Reconstruction

CVPR 2020
0
citations

Multi-Scale Progressive Fusion Network for Single Image Deraining

CVPR 2020arXiv
0
citations

Learning Normal Dynamics in Videos With Meta Prototype Network

CVPR 2021arXiv
0
citations

VIGOR: Cross-View Image Geo-Localization Beyond One-to-One Retrieval

CVPR 2021arXiv
0
citations

TransGeo: Transformer Is All You Need for Cross-View Image Geo-Localization

CVPR 2022arXiv
0
citations

SPAct: Self-Supervised Privacy Preservation for Action Recognition

CVPR 2022arXiv
0
citations

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning

CVPR 2022arXiv
0
citations

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation

CVPR 2023arXiv
0
citations

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

CVPR 2023arXiv
0
citations

TopNet: Transformer-Based Object Placement Network for Image Compositing

CVPR 2023arXiv
0
citations

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

CVPR 2023arXiv
0
citations

Dynamic Graph Learning With Content-Guided Spatial-Frequency Relation Reasoning for Deepfake Detection

CVPR 2023
0
citations

TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

CVPR 2023arXiv
0
citations

Private Image Generation With Dual-Purpose Auxiliary Classifier

CVPR 2023
0
citations

R2Former: Unified Retrieval and Reranking Transformer for Place Recognition

CVPR 2023
0
citations

POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

CVPR 2023arXiv
0
citations

Robust Image Segmentation Using Contour-Guided Color Palettes

ICCV 2015
0
citations

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

ICCV 2017
0
citations

Seeing Motion in the Dark

ICCV 2019
0
citations

3D Human Pose Estimation With Spatial and Temporal Transformers

ICCV 2021arXiv
0
citations

Pseudo-label Alignment for Semi-supervised Instance Segmentation

ICCV 2023arXiv
0
citations

FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning

ICCV 2023arXiv
0
citations

PGFed: Personalize Each Client's Global Objective for Federated Learning

ICCV 2023
0
citations

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection

ICCV 2023arXiv
0
citations

A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition

ICCV 2023arXiv
0
citations

When Do Curricula Work in Federated Learning?

ICCV 2023arXiv
0
citations

RenderIH: A Large-Scale Synthetic Dataset for 3D Interacting Hand Pose Estimation

ICCV 2023arXiv
0
citations

Source-free Domain Adaptive Human Pose Estimation

ICCV 2023arXiv
0
citations

Towards Geospatial Foundation Models via Continual Pretraining

ICCV 2023arXiv
0
citations

Reconciling Object-Level and Global-Level Objectives for Long-Tail Detection

ICCV 2023
0
citations

Multi-view Self-supervised Disentanglement for General Image Denoising

ICCV 2023arXiv
0
citations

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution

ECCV 2020
0
citations

Self-supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation

ECCV 2020
0
citations

Unstructured Feature Decoupling for Vehicle Re-identification

ECCV 2022
0
citations

Med-DANet: Dynamic Architecture Network for Efficient Medical Volumetric Segmentation

ECCV 2022
0
citations

GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing

ECCV 2022
0
citations

GAMa: Cross-view Video Geo-localization

ECCV 2022
0
citations

TARGET: Federated Class-Continual Learning via Exemplar-Free Distillation

ICCV 2023arXiv
0
citations

Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level

CVPR 2025
0
citations

UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification

CVPR 2025
0
citations

Argus: A Compact and Versatile Foundation Model for Vision

CVPR 2025
0
citations

Frequency-Semantic Enhanced Variational Autoencoder for Zero-Shot Skeleton-based Action Recognition

ICCV 2025
0
citations

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

ICCV 2025
0
citations

MixA: A Mixed Attention approach with Stable Lightweight Linear Attention to enhance Efficiency of Vision Transformers at the Edge

ICCV 2025
0
citations

UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing

ICCV 2025
0
citations

TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

NeurIPS 2025
0
citations

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

AAAI 2025
0
citations

Dive into Aerial Remote Sensing Underwater Depth Estimation with Hyperspectral Imagery

AAAI 2025
0
citations

GenHMR: Generative Human Mesh Recovery

AAAI 2025
0
citations

From Coarse to Fine: A Matching and Alignment Framework for Unsupervised Cross-View Geo-Localization

AAAI 2025
0
citations

ST-FiT: Inductive Spatial-Temporal Forecasting with Limited Training Data

AAAI 2025
0
citations

Virtual Nodes Can Help: Tackling Distribution Shifts in Federated Graph Learning

AAAI 2025
0
citations

Certified Causal Defense with Generalizable Robustness

AAAI 2025
0
citations

Towards Improved Proxy-Based Deep Metric Learning via Data-Augmented Domain Adaptation

AAAI 2024arXiv
0
citations

Decouple Content and Motion for Conditional Image-to-Video Generation

AAAI 2024
0
citations

Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection

CVPR 2024
0
citations

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

CVPR 2024
0
citations

MMM: Generative Masked Motion Model

CVPR 2024
0
citations

OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

CVPR 2024
0
citations

Towards Memorization-Free Diffusion Models

CVPR 2024
0
citations

A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation

CVPR 2024
0
citations

How to Trace Latent Generative Model Generated Images without Artificial Watermark?

ICML 2024
0
citations

COALA: A Practical and Vision-Centric Federated Learning Platform

ICML 2024
0
citations

Deep Sparse Representation for Robust Image Registration

CVPR 2015
0
citations

Binary Coding for Partial Action Analysis With Limited Observation Ratios

CVPR 2017
0
citations

Cross-View Image Matching for Geo-Localization in Urban Environments

CVPR 2017arXiv
0
citations

Semantic Image Inpainting With Deep Generative Models

CVPR 2017arXiv
0
citations

Learning to See in the Dark

CVPR 2018arXiv
0
citations

GradAug: A New Regularization Method for Deep Neural Networks

NeurIPS 2020
0
citations

CalFAT: Calibrated Federated Adversarial Training with Label Skewness

NeurIPS 2022
0
citations

Nonnegative Tensor Completion via Integer Optimization

NeurIPS 2022
0
citations

Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning

NeurIPS 2022
0
citations

DENSE: Data-Free One-Shot Federated Learning

NeurIPS 2022
0
citations

Graph Few-shot Learning with Task-specific Structures

NeurIPS 2022
0
citations

Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

NeurIPS 2023
0
citations

Is Heterogeneity Notorious? Taming Heterogeneity to Handle Test-Time Shift in Federated Learning

NeurIPS 2023
0
citations

A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation

NeurIPS 2023
0
citations

Supported Value Regularization for Offline Reinforcement Learning

NeurIPS 2023
0
citations

Where Did I Come From? Origin Attribution of AI-Generated Images

NeurIPS 2023
0
citations

Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

NeurIPS 2023
0
citations