by Yujin Song Papers
3 papers found
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Ryotaro Kawata, Yujin Song, Alberto Bietti et al.
NeurIPS 2025spotlightarXiv:2512.18634
1
citations
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
Wei Huang, Andi Han, Yujin Song et al.
NeurIPS 2025poster
Nonlinear transformers can perform inference-time feature learning
Naoki Nishikawa, Yujin Song, Kazusato Oko et al.
ICML 2025poster