Zhanpeng Zhou
3
Papers
12
Total Citations
Papers (3)
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
ICLR 2025
9
citations
Batch Normalization Is Blind to the First and Second Derivatives of the Loss
AAAI 2024arXiv
3
citations
On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm
ICML 2024
0
citations