William Merrill
4
Papers
10
Total Citations
Papers (4)
Exact Expressive Power of Transformers with Padding
NeurIPS 2025
5
citations
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
NeurIPS 2025arXiv
5
citations
How Language Model Hallucinations Can Snowball
ICML 2024
0
citations
The Illusion of State in State-Space Models
ICML 2024
0
citations