"token efficiency" Papers
2 papers found
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NeurIPS 2025spotlightarXiv:2505.23971
5
citations
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
NeurIPS 2025posterarXiv:2505.19217
4
citations