Dongdong Chen

70
Papers
92
Total Citations

Papers (70)

OmniViD: A Generative Framework for Universal Video Understanding

CVPR 2024
29
citations

Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation

ECCV 2024arXiv
17
citations

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

ICCV 2025
15
citations

SmartEraser: Remove Anything from Images using Masked-Region Guidance

CVPR 2025
12
citations

FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing

ICCV 2025
12
citations

UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery

CVPR 2025
3
citations

Olympus: A Universal Task Router for Computer Vision Tasks

CVPR 2025
3
citations

Exploring Invariance in Images through One-way Wave Equations

ICML 2025
1
citations

Bringing Old Photos Back to Life

CVPR 2020arXiv
0
citations

Robust Superpixel-Guided Attentional Adversarial Attack

CVPR 2020
0
citations

Dynamic Convolution: Attention Over Convolution Kernels

CVPR 2020arXiv
0
citations

Self-Robust 3D Point Recognition via Gather-Vector Guidance

CVPR 2020
0
citations

Density-Aware Graph for Deep Semi-Supervised Visual Recognition

CVPR 2020arXiv
0
citations

Unsupervised Pre-Training for Person Re-Identification

CVPR 2021arXiv
0
citations

Diverse Semantic Image Synthesis via Probability Distribution Modeling

CVPR 2021arXiv
0
citations

Dynamic Head: Unifying Object Detection Heads With Attentions

CVPR 2021arXiv
0
citations

Improved Image Matting via Real-Time User Clicks and Uncertainty Estimation

CVPR 2021arXiv
0
citations

Multi-Attentional Deepfake Detection

CVPR 2021arXiv
0
citations

Mobile-Former: Bridging MobileNet and Transformer

CVPR 2022
0
citations

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

CVPR 2022
0
citations

CSWin Transformer: A General Vision Transformer Backbone With Cross-Shaped Windows

CVPR 2022arXiv
0
citations

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

CVPR 2022arXiv
0
citations

Large-Scale Pre-Training for Person Re-Identification With Noisy Labels

CVPR 2022arXiv
0
citations

BEVT: BERT Pretraining of Video Transformers

CVPR 2022arXiv
0
citations

Shape-Invariant 3D Adversarial Point Clouds

CVPR 2022arXiv
0
citations

HairCLIP: Design Your Hair by Text and Reference Image

CVPR 2022arXiv
0
citations

Bringing Old Films Back to Life

CVPR 2022arXiv
0
citations

Robust Equivariant Imaging: A Fully Unsupervised Framework for Learning To Image From Noisy and Partial Measurements

CVPR 2022arXiv
0
citations

General Facial Representation Learning in a Visual-Linguistic Manner

CVPR 2022arXiv
0
citations

Vector Quantized Diffusion Model for Text-to-Image Synthesis

CVPR 2022arXiv
0
citations

Protecting Celebrities From DeepFake With Identity Consistency Transformer

CVPR 2022arXiv
0
citations

Diversity-Aware Meta Visual Prompting

CVPR 2023arXiv
0
citations

Look Before You Match: Instance Understanding Matters in Video Object Segmentation

CVPR 2023arXiv
0
citations

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning

CVPR 2023arXiv
0
citations

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

CVPR 2023arXiv
0
citations

Streaming Video Model

CVPR 2023arXiv
0
citations

Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

CVPR 2023arXiv
0
citations

Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding

CVPR 2023arXiv
0
citations

Coherent Online Video Style Transfer

ICCV 2017arXiv
0
citations

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once

ICCV 2019
0
citations

Learning With Noisy Labels for Robust Point Cloud Segmentation

ICCV 2021arXiv
0
citations

High-Fidelity Pluralistic Image Completion With Transformers

ICCV 2021arXiv
0
citations

Equivariant Imaging: Learning Beyond the Range Space

ICCV 2021arXiv
0
citations

MicroNet: Improving Image Recognition With Extremely Low FLOPs

ICCV 2021arXiv
0
citations

Improve Unsupervised Pretraining for Few-Label Transfer

ICCV 2021arXiv
0
citations

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

ICCV 2023arXiv
0
citations

HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending

ICCV 2023
0
citations

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

ICCV 2023arXiv
0
citations

Dynamic ReLU

ECCV 2020
0
citations

DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search

ECCV 2020
0
citations

Deep Decomposition Learning for Inverse Imaging Problems

ECCV 2020
0
citations

Should All Proposals Be Treated Equally in Object Detection?

ECCV 2022
0
citations

Bootstrapped Masked Autoencoders for Vision BERT Pretraining

ECCV 2022
0
citations

LG-GAN: Label Guided Adversarial Network for Flexible Targeted Attack of Point Cloud Based Deep Networks

CVPR 2020
0
citations

Show and Segment: Universal Medical Image Segmentation via In-Context Learning

CVPR 2025
0
citations

I2V3D: Controllable Image-to-video Generation with 3D Guidance

ICCV 2025
0
citations

Equivariant Multi-Modality Image Fusion

CVPR 2024
0
citations

Towards More Unified In-context Visual Understanding

CVPR 2024
0
citations

Image Fusion via Vision-Language Model

ICML 2024
0
citations

StyleBank: An Explicit Representation for Neural Image Style Transfer

CVPR 2017arXiv
0
citations

Stereoscopic Neural Style Transfer

CVPR 2018arXiv
0
citations

Transductive Zero-Shot Learning with Visual Structure Constraint

NeurIPS 2019
0
citations

GreedyFool: Distortion-Aware Sparse Adversarial Attack

NeurIPS 2020
0
citations

Passport-aware Normalization for Deep Model Protection

NeurIPS 2020
0
citations

Stronger NAS with Weaker Predictors

NeurIPS 2021
0
citations

Unsupervised Learning From Incomplete Measurements for Inverse Problems

NeurIPS 2022
0
citations

OmniVL: One Foundation Model for Image-Language and Video-Language Tasks

NeurIPS 2022
0
citations

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

NeurIPS 2022
0
citations

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

NeurIPS 2023
0
citations

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

NeurIPS 2023
0
citations