by Will Merrill Papers
3 papers found
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
Will Merrill, Ashish Sabharwal
NeurIPS 2025posterarXiv:2503.03961
30
citations
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NeurIPS 2025spotlightarXiv:2505.23971
5
citations
Exact Expressive Power of Transformers with Padding
Will Merrill, Ashish Sabharwal
NeurIPS 2025poster
5
citations