2025 "sparse mixture of experts" Papers
2 papers found
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.
ICLR 2025posterarXiv:2405.14297
33
citations
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang, Chaozheng Wang, Jing Li
NEURIPS 2025spotlightarXiv:2505.07260