Poster "vision transformer" Papers
21 papers found
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
Enhancing Vision-Language Model with Unmasked Token Alignment
Hongsheng Li, Jihao Liu, Boxiao Liu et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Qihang Fan, Huaibo Huang, Mingrui Chen et al.
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Agglomerative Token Clustering
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu
Data-free Neural Representation Compression with Riemannian Neural Dynamics
Zhengqi Pei, Anran Zhang, Shuhui Wang et al.
Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
Yixing Xu, Chao Li, Dong Li et al.
FairViT: Fair Vision Transformer via Adaptive Masking
Bowei Tian, Ruijie Du, Yanning Shen
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
Information Flow in Self-Supervised Learning
Zhiquan Tan, Jingqin Yang, Weiran Huang et al.
Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
Yongxin Li, Mengyuan Liu, You Wu et al.
One-stage Prompt-based Continual Learning
Youngeun Kim, YUHANG LI, Priyadarshini Panda
Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren, Zeyu Wang, Hongru Zhu et al.
SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning
Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara
Statistical Test for Attention Maps in Vision Transformers
Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka et al.
Stochastic positional embeddings improve masked image modeling
Amir Bar, Florian Bordes, Assaf Shocher et al.
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu, Maziar Sanjabi, Yi Ma et al.
When Will Gradient Regularization Be Harmful?
Yang Zhao, Hao Zhang, Xiuyuan Hu