Poster "length generalization" Papers
5 papers found
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
NeurIPS 2025posterarXiv:2505.21785
3
citations
Mamba Modulation: On the Length Generalization of Mamba Models
Peng Lu, Jerry Huang, QIUHAO Zeng et al.
NeurIPS 2025poster
Case-Based or Rule-Based: How Do Transformers Do the Math?
Yi Hu, Xiaojuan Tang, Haotong Yang et al.
ICML 2024poster
Gated Linear Attention Transformers with Hardware-Efficient Training
Songlin Yang, Bailin Wang, Yikang Shen et al.
ICML 2024poster
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
Zixuan Wang, Stanley Wei, Daniel Hsu et al.
ICML 2024poster