"feed-forward networks" Papers
6 papers found
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.
NeurIPS 2025spotlightarXiv:2503.18908
2
citations
ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation
Pengcheng Huang, Zhenghao Liu, Yukun Yan et al.
NeurIPS 2025posterarXiv:2502.15543
4
citations
Accelerating Transformer Pre-training with 2:4 Sparsity
Yuezhou Hu, Kang Zhao, Weiyu Huang et al.
ICML 2024poster
On the Diminishing Returns of Width for Continual Learning
Etash Guha, Vihan Lakshman
ICML 2024poster
ReLU Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate
Chenghao Liu, Minghua Chen
ICML 2024poster
Vision Transformers as Probabilistic Expansion from Learngene
Qiufeng Wang, Xu Yang, Haokun Chen et al.
ICML 2024poster