"expert specialization" Papers
3 papers found
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura, Takuya Akiba, Kazuki Fujii et al.
ICLR 2025posterarXiv:2502.19261
8
citations
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao et al.
NeurIPS 2025posterarXiv:2306.08586
3
citations
Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?
Huy Nguyen, Pedram Akbarian, Nhat Ho
ICML 2024poster