Yongdong Zhang

55

Papers

62

Total Citations

Papers (55)

DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA

CLIP-Adapted Region-to-Text Learning for Generative Open-Vocabulary Semantic Segmentation

IGD: Instructional Graphic Design with Multimodal Layer Generation

Forensic-MoE: Exploring Comprehensive Synthetic Image Detection Traces with Mixture of Experts

Diffusion-based Source-biased Model for Single Domain Generalized Object Detection

Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation

Task-Adaptive Prompted Transformer for Cross-Domain Few-Shot Learning

Bootstrapping Large Language Models for Radiology Report Generation

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

OTE: Exploring Accurate Scene Text Recognition Using One Token

AnyScene: Customized Image Synthesis with Composited Foreground

DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Reinforcement Learning within Tree Search for Fast Macro Placement

Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design

A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design

Graph Structured Network for Image-Text Matching

ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

Domain-Aware Visual Bias Eliminating for Generalized Zero-Shot Learning

Multi-Modality Cross Attention Network for Image and Sentence Matching

Self-Supervised Domain-Aware Generative Network for Generalized Zero-Shot Learning

Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

Lesion-Aware Transformers for Diabetic Retinopathy Grading

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Frequency-Aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection

Action Unit Memory Network for Weakly Supervised Temporal Action Localization

Uncertainty Guided Collaborative Training for Weakly Supervised Temporal Action Detection

Partial Class Activation Attention for Semantic Segmentation

Motion-Modulated Temporal Fragment Alignment Network for Few-Shot Action Recognition

Negative-Aware Attention Framework for Image-Text Matching

Towards Accurate Image Coding: Improved Autoregressive Image Generation With Dynamic Vector Quantization

Learning Semantic Relationship Among Instances for Image-Text Matching

Not All Image Regions Matter: Masked Vector Quantization for Autoregressive Image Generation

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization

Dynamic Generative Targeted Attacks With Pattern Injection

Crossing the Gap: Domain Generalization for Image Captioning

Foreground Activation Maps for Weakly Supervised Object Localization

Explainable Person Re-Identification With Attribute-Guided Metric Distillation

Meta-Attack: Class-Agnostic and Model-Agnostic Physical Adversarial Attack

From Two to One: A New Scene Text Recognizer With Visual Language Modeling Network

Task-Aware Part Mining Network for Few-Shot Learning

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval

Adaptive Template Transformer for Mitochondria Segmentation in Electron Microscopy Images

Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video Retrieval

Cross-Modality Transformer for Visible-Infrared Person Re-identification

Detecting Tampered Scene Text in the Wild

Hierarchical Granularity Transfer Learning

Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets

NeurIPS 2022arXiv

A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability

NeurIPS 2023arXiv

MomentDiff: Generative Video Moment Retrieval from Random to Real

NeurIPS 2023arXiv