2024 Poster "gradient descent analysis" Papers
3 papers found
How do Transformers Perform In-Context Autoregressive Learning ?
Michael Sander, Raja Giryes, Taiji Suzuki et al.
ICML 2024poster
How Transformers Learn Causal Structure with Gradient Descent
Eshaan Nichani, Alex Damian, Jason Lee
ICML 2024poster
In-context Convergence of Transformers
Yu Huang, Yuan Cheng, Yingbin LIANG
ICML 2024poster