2025 Poster "sparse activation" Papers
2 papers found
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Ziteng Wang, Jun Zhu, Jianfei Chen
ICLR 2025posterarXiv:2412.14711
28
citations
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi, Shiyu Wang, Yuqi Nie et al.
ICLR 2025posterarXiv:2409.16040
178
citations