2024 "vision transformers" Papers

35 papers found

Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

Dongyoon Hwang, Byungkun Lee, Hojoon Lee et al.

ICML 2024poster

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.

ICML 2024poster

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Kaishen Yuan, Zitong Yu, Xin Liu et al.

ECCV 2024posterarXiv:2403.04697
33
citations

A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee

AAAI 2024paperarXiv:2308.07301
9
citations

Characterizing Model Robustness via Natural Input Gradients

Adrian Rodriguez-Munoz, Tongzhou Wang, Antonio Torralba

ECCV 2024posterarXiv:2409.20139
2
citations

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024paperarXiv:2304.10520
21
citations

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

Itamar Zimerman, Moran Baruch, Nir Drucker et al.

ICML 2024poster

Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Mikkel Jordahn, Pablo Olmos

ICML 2024poster

Denoising Vision Transformers

Jiawei Yang, Katie Luo, Jiefeng Li et al.

ECCV 2024posterarXiv:2401.02957
30
citations

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron Havens, Alexandre Araujo, Huan Zhang et al.

ICML 2024poster

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features

Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.

ECCV 2024posterarXiv:2407.12427
27
citations

Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning

Pengyu Li, Biao Wang, Tianchu Guo et al.

ECCV 2024poster

Improving Interpretation Faithfulness for Vision Transformers

Lijie Hu, Yixin Liu, Ninghao Liu et al.

ICML 2024spotlight

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024poster

LION: Implicit Vision Prompt Tuning

Haixin Wang, Jianlong Chang, Yihang Zhai et al.

AAAI 2024paperarXiv:2303.09992
35
citations

LookupViT: Compressing visual information to a limited number of tokens

Rajat Koner, Gagan Jain, Sujoy Paul et al.

ECCV 2024posterarXiv:2407.12753
15
citations

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.

ECCV 2024posterarXiv:2409.00633
4
citations

Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

Zhiyu Yao, Jian Wang, Haixu Wu et al.

ICML 2024poster

One Meta-tuned Transformer is What You Need for Few-shot Learning

Xu Yang, Huaxiu Yao, Ying WEI

ICML 2024spotlight

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Ananthu Aniraj, Cassio F. Dantas, Dino Ienco et al.

ECCV 2024posterarXiv:2407.04538
6
citations

Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation

Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon et al.

ECCV 2024poster

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Hengyi Wang, Shiwei Tan, Hao Wang

ICML 2024poster

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

Honghao Chen, Zhang Yurong, xiaokun Feng et al.

ICML 2024poster

Robustness Tokens: Towards Adversarial Robustness of Transformers

Brian Pulfer, Yury Belousov, Slava Voloshynovskiy

ECCV 2024posterarXiv:2503.10191

Sample-specific Masks for Visual Reprogramming-based Prompting

Chengyi Cai, Zesheng Ye, Lei Feng et al.

ICML 2024spotlight

Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

Zixuan Hu, Yongxian Wei, Li Shen et al.

ICML 2024poster

Spatial Transform Decoupling for Oriented Object Detection

Hongtian Yu, Yunjie Tian, Qixiang Ye et al.

AAAI 2024paperarXiv:2308.10561

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Xixu Hu, Runkai Zheng, Jindong Wang et al.

ECCV 2024posterarXiv:2402.03317
5
citations

Stitched ViTs are Flexible Vision Backbones

Zizheng Pan, Jing Liu, Haoyu He et al.

ECCV 2024posterarXiv:2307.00154
4
citations

Sub-token ViT Embedding via Stochastic Resonance Transformers

Dong Lao, Yangchao Wu, Tian Yu Liu et al.

ICML 2024poster

TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation

Yuhao Wang, Xuehu Liu, Pingping Zhang et al.

AAAI 2024paperarXiv:2312.09612
45
citations

Vision Transformers as Probabilistic Expansion from Learngene

Qiufeng Wang, Xu Yang, Haokun Chen et al.

ICML 2024poster

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Dezhi Peng, Chongyu Liu, Yuliang Liu et al.

AAAI 2024paperarXiv:2306.12106
18
citations

xT: Nested Tokenization for Larger Context in Large Images

Ritwik Gupta, Shufan Li, Tyler Zhu et al.

ICML 2024poster