2024 "vision transformers" Papers
35 papers found
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang, Byungkun Lee, Hojoon Lee et al.
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu, Xin Liu et al.
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis
Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee
Characterizing Model Robustness via Natural Input Gradients
Adrian Rodriguez-Munoz, Tongzhou Wang, Antonio Torralba
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
Itamar Zimerman, Moran Baruch, Nir Drucker et al.
Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Mikkel Jordahn, Pablo Olmos
Denoising Vision Transformers
Jiawei Yang, Katie Luo, Jiefeng Li et al.
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Yunshan Zhong, Jiawei Hu, You Huang et al.
Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
Aaron Havens, Alexandre Araujo, Huan Zhang et al.
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
Pengyu Li, Biao Wang, Tianchu Guo et al.
Improving Interpretation Faithfulness for Vision Transformers
Lijie Hu, Yixin Liu, Ninghao Liu et al.
KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
LookupViT: Compressing visual information to a limited number of tokens
Rajat Koner, Gagan Jain, Sujoy Paul et al.
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.
Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
Zhiyu Yao, Jian Wang, Haixu Wu et al.
One Meta-tuned Transformer is What You Need for Few-shot Learning
Xu Yang, Huaxiu Yao, Ying WEI
PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
Ananthu Aniraj, Cassio F. Dantas, Dino Ienco et al.
Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon et al.
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
Hengyi Wang, Shiwei Tan, Hao Wang
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
Honghao Chen, Zhang Yurong, xiaokun Feng et al.
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer, Yury Belousov, Slava Voloshynovskiy
Sample-specific Masks for Visual Reprogramming-based Prompting
Chengyi Cai, Zesheng Ye, Lei Feng et al.
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
Zixuan Hu, Yongxian Wei, Li Shen et al.
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu, Yunjie Tian, Qixiang Ye et al.
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Xixu Hu, Runkai Zheng, Jindong Wang et al.
Stitched ViTs are Flexible Vision Backbones
Zizheng Pan, Jing Liu, Haoyu He et al.
Sub-token ViT Embedding via Stochastic Resonance Transformers
Dong Lao, Yangchao Wu, Tian Yu Liu et al.
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
Yuhao Wang, Xuehu Liu, Pingping Zhang et al.
Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen et al.
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Dezhi Peng, Chongyu Liu, Yuliang Liu et al.
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta, Shufan Li, Tyler Zhu et al.