"transformer architecture" Papers
137 papers found • Page 1 of 3
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Daolang Huang, Xinyi Wen, Ayush Bharti et al.
Attention-based clustering
Rodrigo Maulen Soto, Pierre Marion, Claire Boyer
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
Can Transformers Do Enumerative Geometry?
Baran Hashemi, Roderic Corominas, Alessandro Giacchetto
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
Byung Hyun Lee, Wongi Jeong, Woojae Han et al.
CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning
Yifei Zhang, Hao Zhu, Junhao Dong et al.
DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy
Rui Zhao, Yuze Fan, Ziguo Chen et al.
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo, Shuai Lu, Weihang Zhang et al.
Dynamic Diffusion Transformer
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
Dynamic Semantic-Aware Correlation Modeling for UAV Tracking
Xinyu Zhou, Tongxin Pan, Lingyi Hong et al.
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models
Yasha Ektefaie, Andrew Shen, Lavik Jain et al.
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.
Flatten Graphs as Sequences: Transformers are Scalable Graph Generators
Dexiong Chen, Markus Krimmel, Karsten Borgwardt
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach
Jason Piquenot, Maxime Berar, Romain Raveaux et al.
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
Impact of Layer Norm on Memorization and Generalization in Transformers
Rishi Singhal, Jung-Eun Kim
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu, Yuan Zhang, Yiming Dong et al.
Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics
Xiaotang Wang, Xuanwei Lin, Yun Zhu et al.
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Shen Zhang, Siyuan Liang, Yaning Tan et al.
LMFusion: Adapting Pretrained Language Models for Multimodal Generation
Weijia Shi, Xiaochuang Han, Chunting Zhou et al.
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo, Zeyu HU, Na Zhao et al.
On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery
Renpu Liu, Ruida Zhou, Cong Shen et al.
On the Role of Hidden States of Modern Hopfield Network in Transformer
Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning
Baiyuan Chen, Shinji Ito, Masaaki Imaizumi
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Zhijian Zhuo, Ya Wang, Yutao Zeng et al.
Quantum Doubly Stochastic Transformers
Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.
Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think
Ge Wu, Shen Zhang, Ruijing Shi et al.
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Yu Bo, Weian Mao, Daniel Shao et al.
SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.
Spiking Neural Networks Need High-Frequency Information
Yuetong Fang, Deming Zhou, Ziqing Wang et al.
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
Task Descriptors Help Transformers Learn Linear Models In-Context
Ruomin Huang, Rong Ge
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He et al.
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
Renpu Liu, Jing Yang
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
Why In-Context Learning Models are Good Few-Shot Learners?
Shiguang Wu, Yaqing Wang, Quanming Yao