2025 Spotlight "transformer architecture" Papers
9 papers found
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos
Zhen Xu, Zhengqin Li, Zhao Dong et al.
NEURIPS 2025spotlightarXiv:2506.08015
14
citations
Absence Bench: Language Models Can’t See What’s Missing
Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore et al.
NEURIPS 2025spotlight
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Tianyi Chen, Pengxiao Lin, Zhiwei Wang et al.
NEURIPS 2025spotlightarXiv:2509.17514
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Daolang Huang, Xinyi Wen, Ayush Bharti et al.
NEURIPS 2025spotlightarXiv:2506.07259
2
citations
Depth-Width Tradeoffs for Transformers on Graph Tasks
Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher et al.
NEURIPS 2025spotlight
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.
NEURIPS 2025spotlightarXiv:2503.18908
2
citations
Quantum Doubly Stochastic Transformers
Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.
NEURIPS 2025spotlightarXiv:2504.16275
2
citations
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
NEURIPS 2025spotlightarXiv:2505.17329
4
citations
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
NEURIPS 2025spotlight