NEURIPS 2025 "model efficiency" Papers
3 papers found
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
NEURIPS 2025poster
Hierachical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
Yongqiang Yao, Jingru Tan, Kaihuan Liang et al.
NEURIPS 2025poster
2
citations
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
Chong You, Kan Wu, Zhipeng Jia et al.
NEURIPS 2025poster
2
citations