2025 "transformer architecture" Papers

70 papers found • Page 2 of 2

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

ICCV 2025posterarXiv:2503.15986
1
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025posterarXiv:2407.15811
26
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025posterarXiv:2410.06916
39
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025posterarXiv:2502.06415
15
citations

Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang, Rong Ge

ICLR 2025poster
3
citations

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.

ICLR 2025posterarXiv:2410.12360

Transformer brain encoders explain human high-level visual responses

Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte

NeurIPS 2025spotlightarXiv:2505.17329
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NeurIPS 2025posterarXiv:2510.19797
1
citations

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.

ICLR 2025oralarXiv:2405.13861
14
citations

Transformers Handle Endogeneity in In-Context Linear Regression

Haodong Liang, Krishna Balasubramanian, Lifeng Lai

ICLR 2025posterarXiv:2410.01265
4
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025posterarXiv:2502.21212
18
citations

Transformers Struggle to Learn to Search

Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.

ICLR 2025posterarXiv:2412.04703
15
citations

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025posterarXiv:2503.10622
96
citations

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.

NeurIPS 2025spotlight

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.

NeurIPS 2025posterarXiv:2506.09278
13
citations

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

Renpu Liu, Jing Yang

NeurIPS 2025posterarXiv:2601.10058
1
citations

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma, Ruoxiang Xu, Yongqiang Cai

NeurIPS 2025posterarXiv:2511.06376

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Weronika Ormaniec, Felix Dangel, Sidak Pal Singh

ICLR 2025posterarXiv:2410.10986
10
citations

Why In-Context Learning Models are Good Few-Shot Learners?

Shiguang Wu, Yaqing Wang, Quanming Yao

ICLR 2025poster

ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention

Qiuhao Zeng, Jierui Huang, Peng Lu et al.

ICLR 2025posterarXiv:2501.14577
5
citations