2025 "data parallelism" Papers
2 papers found
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NeurIPS 2025spotlightarXiv:2505.23971
5
citations
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.
ICLR 2025posterarXiv:2410.21676
37
citations