"linear attention" Papers

25 papers found

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

Hagay Michaeli, Daniel Soudry

NEURIPS 2025posterarXiv:2510.22673

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Songhua Liu, Zhenxiong Tan, Xinchao Wang

NEURIPS 2025posterarXiv:2412.16112
20
citations

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Naoki Nishikawa, Rei Higuchi, Taiji Suzuki

NEURIPS 2025posterarXiv:2507.03340
1
citations

Exploring Diffusion Transformer Designs via Grafting

Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.

NEURIPS 2025oralarXiv:2506.05340
4
citations

Gating is Weighting: Understanding Gated Linear Attention through In-context Learning

Yingcong Li, Davoud Ataee Tarzanagh, Ankit Singh Rawat et al.

COLM 2025paper
5
citations

Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection

Hanshi Wang, Jin Gao, Weiming Hu et al.

ICCV 2025highlightarXiv:2507.04369

Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search

Yuxian Gu, Qinghao Hu, Haocheng Xi et al.

NEURIPS 2025posterarXiv:2508.15884
15
citations

Learning Linear Attention in Polynomial Time

Morris Yau, Ekin Akyürek, Jiayuan Mao et al.

NEURIPS 2025oralarXiv:2410.10101
4
citations

Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Zeyuan Allen-Zhu

NEURIPS 2025posterarXiv:2512.17351
8
citations

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025posterarXiv:2501.15061
36
citations

Rectifying Magnitude Neglect in Linear Attention

Qihang Fan, Huaibo Huang, Yuang Ai et al.

ICCV 2025highlightarXiv:2507.00698
5
citations

Stuffed Mamba: Oversized States Lead to the Inability to Forget

Yingfa Chen, Xinrong Zhang, Shengding Hu et al.

COLM 2025paper
3
citations

ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels

Benjamin Spector, Simran Arora, Aaryan Singhal et al.

ICLR 2025poster
3
citations

ZeroS: Zero‑Sum Linear Attention for Efficient Transformers

Jiecheng Lu, Xu Han, Yan Sun et al.

NEURIPS 2025spotlightarXiv:2602.05230

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024posterarXiv:2312.08874
208
citations

AttnZero: Efficient Attention Discovery for Vision Transformers

Lujun Li, Zimian Wei, Peijie Dong et al.

ECCV 2024poster
14
citations

DiJiang: Efficient Large Language Models through Compact Kernelization

Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang et al.

ICML 2024posterarXiv:2403.19928

Gated Linear Attention Transformers with Hardware-Efficient Training

Songlin Yang, Bailin Wang, Yikang Shen et al.

ICML 2024posterarXiv:2312.06635

Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

Zhiyu Yao, Jian Wang, Haixu Wu et al.

ICML 2024poster

ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention

Chenhang He, Ruihuang Li, Guowen Zhang et al.

ECCV 2024posterarXiv:2401.00912
13
citations

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Zicheng Liu, Siyuan Li, Li Wang et al.

ICML 2024posterarXiv:2406.08128

Simple linear attention language models balance the recall-throughput tradeoff

Simran Arora, Sabri Eyuboglu, Michael Zhang et al.

ICML 2024spotlightarXiv:2402.18668

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo, Xinghao Chen, Yehui Tang et al.

ICML 2024posterarXiv:2405.11582

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Zhen Qin, Weigao Sun, Dong Li et al.

ICML 2024posterarXiv:2405.17381

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You, Yichao Fu, Zheng Wang et al.

ICML 2024posterarXiv:2406.07368