"vision transformer" Papers
27 papers found
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
On the Role of Hidden States of Modern Hopfield Network in Transformer
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Agglomerative Token Clustering
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.
Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Saebom Leem, Hyunseok Seo
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu
Data-free Neural Representation Compression with Riemannian Neural Dynamics
Zhengqi Pei, Anran Zhang, Shuhui Wang et al.
Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
Yixing Xu, Chao Li, Dong Li et al.
FairViT: Fair Vision Transformer via Adaptive Masking
Bowei Tian, Ruijie Du, Yanning Shen
FiT: Flexible Vision Transformer for Diffusion Model
Zeyu Lu, ZiDong Wang, Di Huang et al.
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
Information Flow in Self-Supervised Learning
Zhiquan Tan, Jingqin Yang, Weiran Huang et al.
Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
Yongxin Li, Mengyuan Liu, You Wu et al.
MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer
Junde Wu, Wei Ji, Huazhu Fu et al.
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balazevic, Yuge Shi, Pinelopi Papalampidi et al.
One-stage Prompt-based Continual Learning
Youngeun Kim, YUHANG LI, Priyadarshini Panda
Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren, Zeyu Wang, Hongru Zhu et al.
S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang et al.
Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning
Kaiyou Song, Shan Zhang, Tong Wang
SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning
Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara
Statistical Test for Attention Maps in Vision Transformers
Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka et al.
Stochastic positional embeddings improve masked image modeling
Amir Bar, Florian Bordes, Assaf Shocher et al.
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu, Maziar Sanjabi, Yi Ma et al.
Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting
Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.
ViT-Calibrator: Decision Stream Calibration for Vision Transformer
Lin Chen, Zhijie Jia, Lechao Cheng et al.
When Will Gradient Regularization Be Harmful?
Yang Zhao, Hao Zhang, Xiuyuan Hu