Poster "delayed generalization" Papers
3 papers found
Grokking at the Edge of Numerical Stability
Lucas Prieto, Melih Barsbey, Pedro Mediano et al.
ICLR 2025posterarXiv:2501.04697
17
citations
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Zhiwei Xu, Zhiyu Ni, Yixin Wang et al.
ICLR 2025posterarXiv:2504.13292
6
citations
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk
ICML 2024poster