Heng Tao Shen

42
Papers
87
Total Citations

Papers (42)

DePT: Decoupled Prompt Tuning

CVPR 2024
60
citations

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

CVPR 2024
19
citations

ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models

CVPR 2024
4
citations

TAU-106K: A New Dataset for Comprehensive Understanding of Traffic Accident

ICLR 2025
3
citations

PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos

CVPR 2025
1
citations

CDTR: Semantic Alignment for Video Moment Retrieval Using Concept Decomposition Transformer

AAAI 2025
0
citations

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering

AAAI 2024
0
citations

Weakly-Supervised Mirror Detection via Scribble Annotations

AAAI 2024
0
citations

Adaptive Uncertainty-Based Learning for Text-Based Person Retrieval

AAAI 2024
0
citations

ScanERU: Interactive 3D Visual Grounding Based on Embodied Reference Understanding

AAAI 2024arXiv
0
citations

Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion

CVPR 2024
0
citations

Ensemble Diversity Facilitates Adversarial Transferability

CVPR 2024
0
citations

Supervised Discrete Hashing

CVPR 2015
0
citations

Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation

CVPR 2015
0
citations

What's Wrong With That Object? Identifying Images of Unusual Objects by Modelling the Detection Score Distribution

CVPR 2016
0
citations

Multi-Attention Network for One Shot Learning

CVPR 2017
0
citations

Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning

CVPR 2017
0
citations

Sequence-To-Sequence Domain Adaptation Network for Robust Text Image Recognition

CVPR 2019
0
citations

Exact Adversarial Attack to Image Captioning via Structured Output Learning With Latent Variables

CVPR 2019
0
citations

Searching for Actions on the Hyperbole

CVPR 2020
0
citations

What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images

CVPR 2020
0
citations

Universal Weighting Metric Learning for Cross-Modal Matching

CVPR 2020arXiv
0
citations

Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

CVPR 2021
0
citations

Fine-Grained Predicates Learning for Scene Graph Generation

CVPR 2022arXiv
0
citations

Semi-Supervised Video Paragraph Grounding With Contrastive Encoder

CVPR 2022
0
citations

Meta Distribution Alignment for Generalizable Person Re-Identification

CVPR 2022
0
citations

Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

CVPR 2022arXiv
0
citations

Multilateral Semantic Relations Modeling for Image Text Retrieval

CVPR 2023
0
citations

Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation

CVPR 2023
0
citations

Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement

CVPR 2023arXiv
0
citations

Learning Binary Codes for Maximum Inner Product Search

ICCV 2015
0
citations

Leveraging Weak Semantic Relevance for Complex Video Event Classification

ICCV 2017
0
citations

Webly Supervised Fine-Grained Recognition: Benchmark Datasets and an Approach

ICCV 2021arXiv
0
citations

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

ICCV 2021arXiv
0
citations

Part-Aware Transformer for Generalizable Person Re-identification

ICCV 2023arXiv
0
citations

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

ICCV 2023
0
citations

Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

CVPR 2025
0
citations

Patch-wise Attack for Fooling Deep Neural Network

ECCV 2020
0
citations

Towards Explicit Geometry-Reflectance Collaboration for Generalized LiDAR Segmentation in Adverse Weather

CVPR 2025
0
citations

CoSMIC: Continual Self-supervised Learning for Multi-Domain Medical Imaging via Conditional Mutual Information Maximization

ICCV 2025
0
citations

Implicit Counterfactual Learning for Audio-Visual Segmentation

ICCV 2025
0
citations

Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy

ICCV 2025
0
citations