Yu-Gang Jiang

28
Papers
659
Total Citations

Papers (28)

NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving

AAAI 2024arXiv
266
citations

SimDA: Simple Diffusion Adapter for Efficient Video Generation

CVPR 2024
106
citations

Adversarial Prompt Tuning for Vision-Language Models

ECCV 2024
33
citations

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

ICCV 2025arXiv
33
citations

OmniViD: A Generative Framework for Universal Video Understanding

CVPR 2024
29
citations

Doubly Abductive Counterfactual Inference for Text-based Image Editing

CVPR 2024
25
citations

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

ICCV 2025
24
citations

MotionFollower: Editing Video Motion via Score-Guided Diffusion

ICCV 2025
22
citations

PromptFusion: Decoupling Stability and Plasticity for Continual Learning

ECCV 2024
21
citations

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

AAAI 2025
19
citations

LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation

AAAI 2024arXiv
17
citations

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

ICLR 2025
16
citations

Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image

ECCV 2024
12
citations

DuMo: Dual Encoder Modulation Network for Precise Concept Erasure

AAAI 2025
7
citations

Out of Length Text Recognition with Sub-String Matching

AAAI 2025
7
citations

Learning to Rank Patches for Unbiased Image Redundancy Reduction

CVPR 2024
6
citations

REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents

ICCV 2025
5
citations

Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation

ICCV 2025arXiv
5
citations

AIM: Additional Image Guided Generation of Transferable Adversarial Attacks

AAAI 2025
3
citations

FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network

AAAI 2025
1
citations

From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning

ICCV 2025
1
citations

Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning

ICCV 2025
1
citations

MotionEditor: Editing Video Motion via Content-Aware Diffusion

CVPR 2024
0
citations

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

ICCV 2025
0
citations

SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition

ICCV 2025
0
citations

IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves

ICCV 2025
0
citations

Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection

AAAI 2025
0
citations

Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning

AAAI 2024arXiv
0
citations