Spotlight "attention mechanism" Papers
20 papers found
Conference
Absence Bench: Language Models Can’t See What’s Missing
Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore et al.
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Tianyi Chen, Pengxiao Lin, Zhiwei Wang et al.
A Closer Look at Graph Transformers: Cross-Aggregation and Beyond
Jiaming Zhuo, Ziyi Ma, Yintong Lu et al.
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
Yuang Ai, Qihang Fan, Xuefeng Hu et al.
Enhancing Training Data Attribution with Representational Optimization
Weiwei Sun, Haokun Liu, Nikhil Kandpal et al.
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan et al.
MoBA: Mixture of Block Attention for Long-Context LLMs
Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.
Polyline Path Masked Attention for Vision Transformer
Zhongchen Zhao, Chaodong Xiao, Hui LIN et al.
Quantum Doubly Stochastic Transformers
Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.
SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering
Ruimeng Liu, Xin Zou, Chang Tang et al.
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Shuo Yang, Haocheng Xi, Yilong Zhao et al.
Tensor Product Attention Is All You Need
Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.
Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
Qishuai Wen, Zhiyuan Huang, Chun-Guang Li
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation
Jacob Si, Wendy Yusi Cheng, Michael Cooper et al.
Relaxing the Accurate Imputation Assumption in Doubly Robust Learning for Debiased Collaborative Filtering
Haoxuan Li, Chunyuan Zheng, Shuyi Wang et al.
Simple linear attention language models balance the recall-throughput tradeoff
Simran Arora, Sabri Eyuboglu, Michael Zhang et al.
Sparse and Structured Hopfield Networks
Saúl Santos, Vlad Niculae, Daniel McNamee et al.