2025 Poster "long-context modeling" Papers
9 papers found
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
ICLR 2025posterarXiv:2410.05258
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
NEURIPS 2025posterarXiv:2504.16795
2
citations
Mamba Goes HoME: Hierarchical Soft Mixture-of-Experts for 3D Medical Image Segmentation
Szymon Płotka, Gizem Mert, Maciej Chrabaszcz et al.
NEURIPS 2025posterarXiv:2507.06363
1
citations
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
CVPR 2025posterarXiv:2410.00871
9
citations
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck
ICLR 2025posterarXiv:2408.03350
23
citations
One-Minute Video Generation with Test-Time Training
Jiarui Xu, Shihao Han, Karan Dalal et al.
CVPR 2025posterarXiv:2504.05298
66
citations
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
NEURIPS 2025posterarXiv:2501.18795
20
citations
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu, Qian Liu, Haonan Wang et al.
NEURIPS 2025posterarXiv:2503.15450
2
citations
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
NEURIPS 2025posterarXiv:2503.14376
8
citations