ICLR "mixture of experts" Papers
5 papers found
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
Taishi Nakamura, Takuya Akiba, Kazuki Fujii et al.
ICLR 2025posterarXiv:2502.19261
8
citations
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
ICLR 2025posterarXiv:2408.15881
34
citations
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
Xinyi Liu, Yujie Wang, Fangcheng Fu et al.
ICLR 2025poster
11
citations
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le, Chau Nguyen, Huy Nguyen et al.
ICLR 2025posterarXiv:2410.02200
12
citations
Swift Hydra: Self-Reinforcing Generative Framework for Anomaly Detection with Multiple Mamba Models
Hoang Khoi Nguyen Do, Truc Nguyen, Malik Hassanaly et al.
ICLR 2025posterarXiv:2503.06413
2
citations