2025 "transformer efficiency" Papers
7 papers found
Attribution-Driven Adaptive Token Pruning for Transformers
YAOYAO YAN, Hui Yu, Weizhi Xu
NeurIPS 2025poster
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
Naoki Nishikawa, Rei Higuchi, Taiji Suzuki
NeurIPS 2025posterarXiv:2507.03340
1
citations
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.
ICLR 2025posterarXiv:2405.14297
33
citations
FlashBias: Fast Computation of Attention with Bias
Haixu Wu, Minghao Guo, Yuezhou Ma et al.
NeurIPS 2025posterarXiv:2505.12044
1
citations
Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation
Jiesong Liu, Xipeng Shen
NeurIPS 2025poster
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
ICLR 2025posterarXiv:2410.05462
1
citations
ZeroS: Zero‑Sum Linear Attention for Efficient Transformers
Jiecheng Lu, Xu Han, Yan Sun et al.
NeurIPS 2025spotlight