Yu-Gang Jiang
69
Papers
654
Total Citations
Papers (69)
NuScenes-QA: A Multi-Modal Visual Question Answering Benchmark for Autonomous Driving
AAAI 2024arXiv
266
citations
SimDA: Simple Diffusion Adapter for Efficient Video Generation
CVPR 2024
106
citations
Adversarial Prompt Tuning for Vision-Language Models
ECCV 2024
33
citations
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
ICCV 2025arXiv
33
citations
OmniViD: A Generative Framework for Universal Video Understanding
CVPR 2024
29
citations
Doubly Abductive Counterfactual Inference for Text-based Image Editing
CVPR 2024
25
citations
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
ICCV 2025
24
citations
MotionFollower: Editing Video Motion via Score-Guided Diffusion
ICCV 2025
22
citations
PromptFusion: Decoupling Stability and Plasticity for Continual Learning
ECCV 2024
21
citations
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
AAAI 2025
19
citations
LRANet: Towards Accurate and Efficient Scene Text Detection with Low-Rank Approximation
AAAI 2024arXiv
17
citations
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
ICLR 2025
16
citations
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
ECCV 2024
12
citations
Out of Length Text Recognition with Sub-String Matching
AAAI 2025
7
citations
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
AAAI 2025
7
citations
Learning to Rank Patches for Unbiased Image Redundancy Reduction
CVPR 2024
6
citations
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
ICCV 2025
5
citations
AIM: Additional Image Guided Generation of Transferable Adversarial Attacks
AAAI 2025
3
citations
FaceA-Net: Facial Attribute-Driven ID Preserving Image Generation Network
AAAI 2025
1
citations
Achieving More with Less: Additive Prompt Tuning for Rehearsal-Free Class-Incremental Learning
ICCV 2025
1
citations
From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning
ICCV 2025
1
citations
Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
CVPR 2023arXiv
0
citations
ResFormer: Scaling ViTs With Multi-Resolution Training
CVPR 2023arXiv
0
citations
SVFormer: Semi-Supervised Video Transformer for Action Recognition
CVPR 2023arXiv
0
citations
Look Before You Match: Instance Understanding Matters in Video Object Segmentation
CVPR 2023arXiv
0
citations
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-Supervised Video Representation Learning
CVPR 2023arXiv
0
citations
Bi-Directional Feature Fusion Generative Adversarial Network for Ultra-High Resolution Pathological Image Virtual Re-Staining
CVPR 2023
0
citations
Enhancing the Self-Universality for Transferable Targeted Attacks
CVPR 2023arXiv
0
citations
Prototypical Residual Networks for Anomaly Detection and Localization
CVPR 2023arXiv
0
citations
MSMDFusion: Fusing LiDAR and Camera at Multiple Scales With Multi-Depth Seeds for 3D Object Detection
CVPR 2023arXiv
0
citations
StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot Learning
CVPR 2023arXiv
0
citations
Detection Hub: Unifying Object Detection Datasets via Query Adaptation on Language Embedding
CVPR 2023arXiv
0
citations
Multi-Scale Deep Learning Architectures for Person Re-Identification
ICCV 2017arXiv
0
citations
Revisiting Adversarial Robustness Distillation: Robust Soft Labels Make Student Better
ICCV 2021arXiv
0
citations
Motion Guided Region Message Passing for Video Captioning
ICCV 2021
0
citations
VideoLT: Large-Scale Long-Tailed Video Recognition
ICCV 2021arXiv
0
citations
Implicit Temporal Modeling with Learnable Alignment for Video Recognition
ICCV 2023arXiv
0
citations
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
ICCV 2023arXiv
0
citations
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos
ECCV 2020
0
citations
Hierarchical Visual-Textual Graph for Temporal Activity Localization via Language
ECCV 2020
0
citations
Semi-Supervised Single-View 3D Reconstruction via Prototype Shape Priors
ECCV 2022
0
citations
Semi-Supervised Vision Transformers
ECCV 2022
0
citations
Efficient Video Transformers with Spatial-Temporal Token Selection
ECCV 2022
0
citations
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
ECCV 2022
0
citations
DSOD: Learning Deeply Supervised Object Detectors From Scratch
ICCV 2017arXiv
0
citations
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks
ICCV 2025
0
citations
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
ICCV 2025
0
citations
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
ICCV 2025
0
citations
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
ICCV 2025
0
citations
Comprehensive Multi-Modal Prototypes Are Simple and Effective Classifiers for Vast-Vocabulary Object Detection
AAAI 2025
0
citations
Instance-Aware Multi-Camera 3D Object Detection with Structural Priors Mining and Self-Boosting Learning
AAAI 2024arXiv
0
citations
MotionEditor: Editing Video Motion via Content-Aware Diffusion
CVPR 2024
0
citations
Harnessing Object and Scene Semantics for Large-Scale Video Understanding
CVPR 2016
0
citations
Weakly Supervised Dense Video Captioning
CVPR 2017arXiv
0
citations
Dual Skipping Networks
CVPR 2018arXiv
0
citations
Hyperbolic Visual Embedding Learning for Zero-Shot Recognition
CVPR 2020
0
citations
Sketch-BERT: Learning Sketch Bidirectional Encoder Representation From Transformers by Self-Supervised Learning of Sketch Gestalt
CVPR 2020
0
citations
FM2u-Net: Face Morphological Multi-Branch Network for Makeup-Invariant Face Verification
CVPR 2020
0
citations
Clean-Label Backdoor Attacks on Video Recognition Models
CVPR 2020arXiv
0
citations
Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning
CVPR 2021
0
citations
Balanced Contrastive Learning for Long-Tailed Visual Recognition
CVPR 2022
0
citations
Cross-Modal Transferable Adversarial Attacks From Images to Videos
CVPR 2022arXiv
0
citations
BEVT: BERT Pretraining of Video Transformers
CVPR 2022arXiv
0
citations
ObjectFormer for Image Manipulation Detection and Localization
CVPR 2022arXiv
0
citations
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
CVPR 2022arXiv
0
citations
LiteEval: A Coarse-to-Fine Framework for Resource Efficient Video Recognition
NeurIPS 2019
0
citations
OmniVL: One Foundation Model for Image-Language and Video-Language Tasks
NeurIPS 2022
0
citations
Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation
NeurIPS 2023
0
citations
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
NeurIPS 2023
0
citations