ICLR 2025 "training dynamics analysis" Papers
2 papers found
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
Yu Huang, Zixin Wen, Yuejie Chi et al.
ICLR 2025posterarXiv:2403.02233
3
citations
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
ICLR 2025posterarXiv:2410.04870
9
citations