2025 Poster "expert specialization" Papers
2 papers found
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura, Takuya Akiba, Kazuki Fujii et al.
ICLR 2025posterarXiv:2502.19261
8
citations
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao et al.
NEURIPS 2025posterarXiv:2306.08586
3
citations