Yongdong Zhang

55
Papers
62
Total Citations

Papers (55)

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

CVPR 2024arXiv
36
citations

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

CVPR 2025arXiv
12
citations

Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

CVPR 2025
8
citations

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

ECCV 2024arXiv
5
citations

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

AAAI 2025arXiv
1
citations

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

ICCV 2025
0
citations

IGD: Instructional Graphic Design with Multimodal Layer Generation

ICCV 2025
0
citations

Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts

ICCV 2025
0
citations

Diffusion-based Source-biased Model for Single Domain Generalized Object Detection

ICCV 2025
0
citations

Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation

ECCV 2024arXiv
0
citations

Task-Adaptive Prompted Transformer for Cross-Domain Few-Shot Learning

AAAI 2024
0
citations

Bootstrapping Large Language Models for Radiology Report Generation

AAAI 2024
0
citations

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

CVPR 2024arXiv
0
citations

OTE: Exploring Accurate Scene Text Recognition Using One Token

CVPR 2024
0
citations

AnyScene: Customized Image Synthesis with Composited Foreground

CVPR 2024
0
citations

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

CVPR 2024arXiv
0
citations

Reinforcement Learning within Tree Search for Fast Macro Placement

ICML 2024
0
citations

Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

ICML 2024arXiv
0
citations

A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design

ICML 2024arXiv
0
citations

A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design

ICML 2024
0
citations

Graph Structured Network for Image-Text Matching

CVPR 2020arXiv
0
citations

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

CVPR 2020arXiv
0
citations

Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning

CVPR 2020arXiv
0
citations

Multi-Modality Cross Attention Network for Image and Sentence Matching

CVPR 2020
0
citations

Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning

CVPR 2020
0
citations

Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

CVPR 2021arXiv
0
citations

Lesion-Aware Transformers for Diabetic Retinopathy Grading

CVPR 2021
0
citations

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

CVPR 2021arXiv
0
citations

Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

CVPR 2021arXiv
0
citations

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

CVPR 2021arXiv
0
citations

Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection

CVPR 2021
0
citations

Partial Class Activation Attention for Semantic Segmentation

CVPR 2022
0
citations

Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition

CVPR 2022
0
citations

Negative-Aware Attention Framework for Image-Text Matching

CVPR 2022
0
citations

Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization

CVPR 2023arXiv
0
citations

Learning Semantic Relationship Among Instances for Image-Text Matching

CVPR 2023
0
citations

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

CVPR 2023arXiv
0
citations

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

CVPR 2023
0
citations

Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization

CVPR 2023
0
citations

Dynamic Generative Targeted Attacks With Pattern Injection

CVPR 2023
0
citations

Crossing the Gap: Domain Generalization for Image Captioning

CVPR 2023
0
citations

Foreground Activation Maps for Weakly Supervised Object Localization

ICCV 2021
0
citations

Explainable Person Re-Identification With Attribute-Guided Metric Distillation

ICCV 2021arXiv
0
citations

Meta-Attack: Class-Agnostic and Model-Agnostic Physical Adversarial Attack

ICCV 2021
0
citations

From Two to One: A New Scene Text Recognizer With Visual Language Modeling Network

ICCV 2021arXiv
0
citations

Task-Aware Part Mining Network for Few-Shot Learning

ICCV 2021
0
citations

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval

ICCV 2023
0
citations

Adaptive Template Transformer for Mitochondria Segmentation in Electron Microscopy Images

ICCV 2023
0
citations

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

ECCV 2022
0
citations

Cross-Modality Transformer for Visible-Infrared Person Re-identification

ECCV 2022
0
citations

Detecting Tampered Scene Text in the Wild

ECCV 2022
0
citations

Hierarchical Granularity Transfer Learning

NeurIPS 2020
0
citations

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

NeurIPS 2022arXiv
0
citations

A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability

NeurIPS 2023arXiv
0
citations

MomentDiff: Generative Video Moment Retrieval from Random to Real

NeurIPS 2023arXiv
0
citations