"transformer architecture" Papers

120 papers found • Page 3 of 3

The Illusion of State in State-Space Models

William Merrill, Jackson Petty, Ashish Sabharwal

ICML 2024poster

The Pitfalls of Next-Token Prediction

Gregor Bachmann, Vaishnavh Nagarajan

ICML 2024poster

Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention

Jiaqi Zhang, Joel Jennings, Agrin Hilmkil et al.

ICML 2024poster

Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration

Zhengyang Zhuge, Peisong Wang, Xingting Yao et al.

ICML 2024poster

Towards General Algorithm Discovery for Combinatorial Optimization: Learning Symbolic Branching Policy from Bipartite Graph

Yufei Kuang, Jie Wang, Yuyan Zhou et al.

ICML 2024poster

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Itay Lavie, Guy Gur-Ari, Zohar Ringel

ICML 2024poster

Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features

Simone Bombari, Marco Mondelli

ICML 2024poster

Trainable Transformer in Transformer

Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.

ICML 2024poster

Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning

Jinsong Shi, Pan Gao, Jie Qin

AAAI 2024paperarXiv:2312.06995
34
citations

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Tri Dao, Albert Gu

ICML 2024poster

Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

Juno Kim, Taiji Suzuki

ICML 2024poster

Translation Equivariant Transformer Neural Processes

Matthew Ashman, Cristiana Diaconu, Junhyuck Kim et al.

ICML 2024oral

Transolver: A Fast Transformer Solver for PDEs on General Geometries

Haixu Wu, Huakun Luo, Haowen Wang et al.

ICML 2024spotlight

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Zhen Qin, Weigao Sun, Dong Li et al.

ICML 2024poster

Viewing Transformers Through the Lens of Long Convolutions Layers

Itamar Zimerman, Lior Wolf

ICML 2024poster

VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning

Tangfei Liao, Xiaoqin Zhang, Li Zhao et al.

AAAI 2024paperarXiv:2312.08774
15
citations

Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing

haijin zeng, Hiep Luong, Wilfried Philips

ECCV 2024poster
1
citations

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

Xingwu Chen, Difan Zou

ICML 2024poster

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Haoran You, Yichao Fu, Zheng Wang et al.

ICML 2024poster

X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer

Linglin Jing, Ying Xue, Xu Yan et al.

AAAI 2024paperarXiv:2312.07378
11
citations