"vision transformers" Papers
51 papers found • Page 1 of 2
A Circular Argument: Does RoPE need to be Equivariant for Vision?
Chase van de Geijn, Timo Lüddecke, Polina Turishcheva et al.
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
Hagay Michaeli, Daniel Soudry
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Andrew Luo, Jacob Yeung, Rushikesh Zawar et al.
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham, Juan C. Caicedo, Bryan Plummer
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
Energy Landscape-Aware Vision Transformers: Layerwise Dynamics and Adaptive Task-Specific Training via Hopfield States
Runze Xia, Richard Jiang
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
Multi-Kernel Correlation-Attention Vision Transformer for Enhanced Contextual Understanding and Multi-Scale Integration
Hongkang Zhang, Shao-Lun Huang, Ercan KURUOGLU et al.
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng, Yadan Luo, Xin Li et al.
Scalable Neural Network Geometric Robustness Validation via Hölder Optimisation
Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
Vision Transformers Don't Need Trained Registers
Nicholas Jiang, Amil Dravid, Alexei Efros et al.
Vision Transformers with Self-Distilled Registers
Zipeng Yan, Yinjie Chen, Chong Zhou et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Mingjia Li, Minjing Dong et al.
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang, Byungkun Lee, Hojoon Lee et al.
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu, Xin Liu et al.
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis
Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee
Characterizing Model Robustness via Natural Input Gradients
Adrian Rodriguez-Munoz, Tongzhou Wang, Antonio Torralba
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
Itamar Zimerman, Moran Baruch, Nir Drucker et al.
Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Mikkel Jordahn, Pablo Olmos
Denoising Vision Transformers
Jiawei Yang, Katie Luo, Jiefeng Li et al.
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Yunshan Zhong, Jiawei Hu, You Huang et al.
Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
Aaron Havens, Alexandre Araujo, Huan Zhang et al.
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
Pengyu Li, Biao Wang, Tianchu Guo et al.
Improving Interpretation Faithfulness for Vision Transformers
Lijie Hu, Yixin Liu, Ninghao Liu et al.
KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.
Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
Zhiyu Yao, Jian Wang, Haixu Wu et al.
One Meta-tuned Transformer is What You Need for Few-shot Learning
Xu Yang, Huaxiu Yao, Ying WEI
PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
Ananthu Aniraj, Cassio F. Dantas, Dino Ienco et al.
Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon et al.
Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models
Hengyi Wang, Shiwei Tan, Hao Wang
Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness
Honghao Chen, Zhang Yurong, xiaokun Feng et al.
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer, Yury Belousov, Slava Voloshynovskiy
Sample-specific Masks for Visual Reprogramming-based Prompting
Chengyi Cai, Zesheng Ye, Lei Feng et al.
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
Zixuan Hu, Yongxian Wei, Li Shen et al.
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu, Yunjie Tian, Qixiang Ye et al.
SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
Xixu Hu, Runkai Zheng, Jindong Wang et al.
Stitched ViTs are Flexible Vision Backbones
Zizheng Pan, Jing Liu, Haoyu He et al.
Sub-token ViT Embedding via Stochastic Resonance Transformers
Dong Lao, Yangchao Wu, Tian Yu Liu et al.
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
Yuhao Wang, Xuehu Liu, Pingping Zhang et al.
Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen et al.
ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining
Dezhi Peng, Chongyu Liu, Yuliang Liu et al.