"model size scaling" Papers
2 papers found
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
NeurIPS 2025spotlightarXiv:2503.09799
12
citations
Do Efficient Transformers Really Save Computation?
Kai Yang, Jan Ackermann, Zhenyu He et al.
ICML 2024posterarXiv:2402.13934