NEURIPS 2025 "linear attention" Papers
8 papers found
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
Hagay Michaeli, Daniel Soudry
NEURIPS 2025posterarXiv:2510.22673
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
NEURIPS 2025posterarXiv:2412.16112
20
citations
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
Naoki Nishikawa, Rei Higuchi, Taiji Suzuki
NEURIPS 2025posterarXiv:2507.03340
1
citations
Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.
NEURIPS 2025oralarXiv:2506.05340
4
citations
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
NEURIPS 2025posterarXiv:2508.15884
15
citations
Learning Linear Attention in Polynomial Time
Morris Yau, Ekin Akyürek, Jiayuan Mao et al.
NEURIPS 2025oralarXiv:2410.10101
4
citations
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Zeyuan Allen-Zhu
NEURIPS 2025posterarXiv:2512.17351
8
citations
ZeroS: Zero‑Sum Linear Attention for Efficient Transformers
Jiecheng Lu, Xu Han, Yan Sun et al.
NEURIPS 2025spotlight