Poster "language model pre-training" Papers
4 papers found
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob, Lorenzo Sani, Meghdad Kurmanji et al.
ICLR 2025posterarXiv:2410.05021
2
citations
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.
ICLR 2025posterarXiv:2503.09543
14
citations
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.
ICLR 2025posterarXiv:2407.01492
99
citations
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu, Tianyu Pang, Chao Du et al.
ICLR 2025posterarXiv:2410.10781
90
citations