"transformer efficiency" Papers
7 papers found
Attribution-Driven Adaptive Token Pruning for Transformers
YAOYAO YAN, Hui Yu, Weizhi Xu
NeurIPS 2025poster
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.
ICLR 2025posterarXiv:2405.14297
33
citations
FlashBias: Fast Computation of Attention with Bias
Haixu Wu, Minghao Guo, Yuezhou Ma et al.
NeurIPS 2025posterarXiv:2505.12044
1
citations
Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation
Jiesong Liu, Xipeng Shen
NeurIPS 2025poster
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
ICLR 2025posterarXiv:2410.05462
1
citations
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang et al.
ICML 2024poster
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo, Xinghao Chen, Yehui Tang et al.
ICML 2024poster