2025 "transformer architecture" Papers
70 papers found • Page 2 of 2
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
Task Descriptors Help Transformers Learn Linear Models In-Context
Ruomin Huang, Rong Ge
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.
Transformers Handle Endogeneity in In-Context Linear Regression
Haodong Liang, Krishna Balasubramanian, Lifeng Lai
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Transformers Struggle to Learn to Search
Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He et al.
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
Renpu Liu, Jing Yang
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma, Ruoxiang Xu, Yongqiang Cai
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
Why In-Context Learning Models are Good Few-Shot Learners?
Shiguang Wu, Yaqing Wang, Quanming Yao
ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention
Qiuhao Zeng, Jierui Huang, Peng Lu et al.