"training dynamics analysis" Papers
8 papers found
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
Yu Huang, Zixin Wen, Yuejie Chi et al.
ICLR 2025posterarXiv:2403.02233
3
citations
Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
Dhruva Karkada, James Simon, Yasaman Bahri et al.
NEURIPS 2025posterarXiv:2502.09863
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi, Fan Nie, Alexandre Alahi et al.
NEURIPS 2025oralarXiv:2506.16029
3
citations
From Condensation to Rank Collapse: A Two-Stage Analysis of Transformer Training Dynamics
Zheng-An Chen, Tao Luo
NEURIPS 2025oralarXiv:2510.06954
1
citations
Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale
James Michaelov, Roger Levy, Benjamin Bergen
NEURIPS 2025oralarXiv:2510.24963
Less is More: Local Intrinsic Dimensions of Contextual Language Models
Benjamin Matthias Ruppik, Julius von Rohrscheidt, Carel van Niekerk et al.
NEURIPS 2025posterarXiv:2506.01034
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
ICLR 2025posterarXiv:2410.04870
9
citations
Quantitative convergence of trained neural networks to Gaussian processes
Andrea Agazzi, Eloy Mosig García, Dario Trevisan
NEURIPS 2025poster