"transformer architecture" Papers
120 papers found • Page 3 of 3
The Illusion of State in State-Space Models
William Merrill, Jackson Petty, Ashish Sabharwal
The Pitfalls of Next-Token Prediction
Gregor Bachmann, Vaishnavh Nagarajan
Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention
Jiaqi Zhang, Joel Jennings, Agrin Hilmkil et al.
Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration
Zhengyang Zhuge, Peisong Wang, Xingting Yao et al.
Towards General Algorithm Discovery for Combinatorial Optimization: Learning Symbolic Branching Policy from Bipartite Graph
Yufei Kuang, Jie Wang, Yuyan Zhou et al.
Towards Understanding Inductive Bias in Transformers: A View From Infinity
Itay Lavie, Guy Gur-Ari, Zohar Ringel
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari, Marco Mondelli
Trainable Transformer in Transformer
Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
Jinsong Shi, Pan Gao, Jie Qin
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao, Albert Gu
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim, Taiji Suzuki
Translation Equivariant Transformer Neural Processes
Matthew Ashman, Cristiana Diaconu, Junhyuck Kim et al.
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Haixu Wu, Huakun Luo, Haowen Wang et al.
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin, Weigao Sun, Dong Li et al.
Viewing Transformers Through the Lens of Long Convolutions Layers
Itamar Zimerman, Lior Wolf
VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning
Tangfei Liao, Xiaoqin Zhang, Li Zhao et al.
Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing
haijin zeng, Hiep Luong, Wilfried Philips
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen, Difan Zou
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang et al.
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
Linglin Jing, Ying Xue, Xu Yan et al.