2025 Poster "transformer language models" Papers
4 papers found
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden et al.
ICLR 2025posterarXiv:2410.02984
23
citations
How to Scale Second-Order Optimization
Charlie Chen, Shikai Qiu, Hoang Phan et al.
NEURIPS 2025poster
Matrix Product Sketching via Coordinated Sampling
Majid Daliri, Juliana Freire, Danrong Li et al.
ICLR 2025posterarXiv:2501.17836
2
citations
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton et al.
ICLR 2025posterarXiv:2409.04185
11
citations