2024 "vision transformer" Papers

24 papers found

Agglomerative Token Clustering

Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.

ECCV 2024posterarXiv:2409.11923
7
citations

Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Saebom Leem, Hyunseok Seo

AAAI 2024paperarXiv:2402.04563
31
citations

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu

ICML 2024poster

Data-free Neural Representation Compression with Riemannian Neural Dynamics

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024poster

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024poster

FairViT: Fair Vision Transformer via Adaptive Masking

Bowei Tian, Ruijie Du, Yanning Shen

ECCV 2024posterarXiv:2407.14799
1
citations

FiT: Flexible Vision Transformer for Diffusion Model

Zeyu Lu, ZiDong Wang, Di Huang et al.

ICML 2024spotlight

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

Ding Jia, Jianyuan Guo, Kai Han et al.

ICML 2024poster

Information Flow in Self-Supervised Learning

Zhiquan Tan, Jingqin Yang, Weiran Huang et al.

ICML 2024poster

Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Yongxin Li, Mengyuan Liu, You Wu et al.

ICML 2024poster

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Junde Wu, Wei Ji, Huazhu Fu et al.

AAAI 2024paperarXiv:2301.11798
259
citations

Memory Consolidation Enables Long-Context Video Understanding

Ivana Balazevic, Yuge Shi, Pinelopi Papalampidi et al.

ICML 2024oral

One-stage Prompt-based Continual Learning

Youngeun Kim, YUHANG LI, Priyadarshini Panda

ECCV 2024posterarXiv:2402.16189
17
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024poster

Rejuvenating image-GPT as Strong Visual Representation Learners

Sucheng Ren, Zeyu Wang, Hongru Zhu et al.

ICML 2024poster

S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention

Chiyu Zhang, Xiaogang Xu, Lei Wang et al.

AAAI 2024paperarXiv:2210.12381
46
citations

Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning

Kaiyou Song, Shan Zhang, Tong Wang

AAAI 2024paperarXiv:2312.10457
2
citations

SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara

ICML 2024poster

Statistical Test for Attention Maps in Vision Transformers

Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka et al.

ICML 2024poster

Stochastic positional embeddings improve masked image modeling

Amir Bar, Florian Bordes, Assaf Shocher et al.

ICML 2024poster

ViP: A Differentially Private Foundation Model for Computer Vision

Yaodong Yu, Maziar Sanjabi, Yi Ma et al.

ICML 2024poster

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.

AAAI 2024paperarXiv:2305.04440
28
citations

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

Lin Chen, Zhijie Jia, Lechao Cheng et al.

AAAI 2024paperarXiv:2304.04354
3
citations

When Will Gradient Regularization Be Harmful?

Yang Zhao, Hao Zhang, Xiuyuan Hu

ICML 2024poster