2025 "scaling laws" Papers

15 papers found

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Xinran Gu, Kaifeng Lyu, Jiazheng Li et al.

NeurIPS 2025spotlightarXiv:2505.18091
2
citations

Diffusion Beats Autoregressive in Data-Constrained Settings

Mihir Prabhudesai, Mengning Wu, Amir Zadeh et al.

NeurIPS 2025posterarXiv:2507.15857
24
citations

Emergence and scaling laws in SGD learning of shallow neural networks

Yunwei Ren, Eshaan Nichani, Denny Wu et al.

NeurIPS 2025posterarXiv:2504.19983
13
citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NeurIPS 2025posterarXiv:2502.06857
10
citations

How DNNs break the Curse of Dimensionality: Compositionality and Symmetry Learning

Arthur Jacot, Seok Hoan Choi, Yuxiao Wen

ICLR 2025posterarXiv:2407.05664
6
citations

How Does Critical Batch Size Scale in Pre-training?

Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.

ICLR 2025posterarXiv:2410.21676
37
citations

Learning in Compact Spaces with Approximately Normalized Transformer

Jörg Franke, Urs Spiegelhalter, Marianna Nezhurina et al.

NeurIPS 2025posterarXiv:2505.22014

Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

Yuda Song, Hanlin Zhang, Carson Eisenach et al.

ICLR 2025posterarXiv:2412.02674

One Filters All: A Generalist Filter For State Estimation

Shiqi Liu, Wenhan Cao, Chang Liu et al.

NeurIPS 2025posterarXiv:2509.20051
2
citations

Power Lines: Scaling laws for weight decay and batch size in LLM pre-training

Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.

NeurIPS 2025posterarXiv:2505.13738
15
citations

Quantifying Elicitation of Latent Capabilities in Language Models

Elizabeth Donoway, Hailey Joren, Arushi Somani et al.

NeurIPS 2025poster

RegMix: Data Mixture as Regression for Language Model Pre-training

Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.

ICLR 2025posterarXiv:2407.01492
99
citations

Scaling and evaluating sparse autoencoders

Leo Gao, Tom Dupre la Tour, Henk Tillman et al.

ICLR 2025posterarXiv:2406.04093
298
citations

Scaling Laws For Scalable Oversight

Joshua Engels, David Baek, Subhash Kantamneni et al.

NeurIPS 2025spotlightarXiv:2504.18530
4
citations

Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws

Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.

NeurIPS 2025spotlightarXiv:2504.09597
5
citations