NeurIPS 2025 "length generalization" Papers
5 papers found
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
NeurIPS 2025posterarXiv:2505.21785
3
citations
Extrapolation by Association: Length Generalization Transfer In Transformers
Ziyang Cai, Nayoung Lee, Avi Schwarzschild et al.
NeurIPS 2025spotlightarXiv:2506.09251
7
citations
HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
Haoran Li, Yingjie Qin, Baoyuan Ou et al.
NeurIPS 2025oralarXiv:2505.20444
2
citations
Mamba Modulation: On the Length Generalization of Mamba Models
Peng Lu, Jerry Huang, QIUHAO Zeng et al.
NeurIPS 2025poster
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
Benjamin Walker, Lingyi Yang, Nicola Muca Cirone et al.
NeurIPS 2025spotlightarXiv:2505.17761
6
citations