"attention mechanism" Papers
380 papers found • Page 1 of 8
Conference
Absence Bench: Language Models Can’t See What’s Missing
Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore et al.
Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data
Tianyi Chen, Pengxiao Lin, Zhiwei Wang et al.
A Closer Look at Graph Transformers: Cross-Aggregation and Beyond
Jiaming Zhuo, Ziyi Ma, Yintong Lu et al.
Adaptive Transformer Programs: Bridging the Gap Between Performance and Interpretability in Transformers
Quoc-Vinh Lai-Dang, Taemin Kang, Seungah Son
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Yoad Tewel, Rinon Gal, Dvir Samuel et al.
Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging
Xianrui Li, Yufei Cui, Jun Li et al.
Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning
Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn, Selim Tekin, Fatih Ilhan et al.
A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention
Qiyu Xu, Zhanxuan Hu, Yu Duan et al.
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu, Yanggeng Li, Fei Hou et al.
Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment
Hua Ye, Hang Ding, Siyuan Chen et al.
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
Xinghui Li, Qichao Sun, Pengze Zhang et al.
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Heejun Lee, Geon Park, Youngwan Lee et al.
Attention as a Hypernetwork
Simon Schug, Seijin Kobayashi, Yassir Akram et al.
Attention (as Discrete-Time Markov) Chains
Yotam Erel, Olaf Dünkel, Rishabh Dabral et al.
Attention-based clustering
Rodrigo Maulen Soto, Pierre Marion, Claire Boyer
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
Attention Mechanism, Max-Affine Partition, and Universal Approximation
Hude Liu, Jerry Yao-Chieh Hu, Zhao Song et al.
Attention with Markov: A Curious Case of Single-layer Transformers
Ashok Makkuva, Marco Bondaschi, Adway Girish et al.
AudioGenX: Explainability on Text-to-Audio Generative Models
Hyunju Kang, Geonhee Han, Yoonjae Jeong et al.
AWRaCLe: All-Weather Image Restoration Using Visual In-Context Learning
Sudarshan Rajagopalan, Vishal M. Patel
Balancing Conservatism and Aggressiveness: Prototype-Affinity Hybrid Network for Few-Shot Segmentation
Tianyu Zou, Shengwu Xiong, Ruilin Yao et al.
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations
Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.
BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology
Amaya Gallagher-Syed, Henry Senior, Omnia Alwazzan et al.
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie, Morris Yau, Samuel J Gershman
Block-Attention for Efficient Prefilling
Dongyang Ma, Yan Wang, Tian Lan
BlockDecoder: Boosting ASR Decoders with Context and Merger Modules
Darshan Prabhu, Preethi Jyothi
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao, Sid Kiblawi, Mu Wei et al.
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
Zhengfei Kuang, Tianyuan Zhang, Kai Zhang et al.
CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution
Xin Liu, Jie Liu, Jie Tang et al.
CATCH: Channel-Aware Multivariate Time Series Anomaly Detection via Frequency Patching
Xingjian Wu, Xiangfei Qiu, Zhengyu Li et al.
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models
Zheng Chong, Xiao Dong, Haoxiang Li et al.
Class Distribution-induced Attention Map for Open-vocabulary Semantic Segmentations
Dong Un Kang, Hayeon Kim, Se Young Chun
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Composing Linear Layers from Irreducibles
Travis Pence, Daisuke Yamada, Vikas Singh
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
Constraint-Aware Feature Learning for Parametric Point Cloud
Xi Cheng, Ruiqi Lei, Di Huang et al.
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing, Pranavi Kolouju, Robert Pless et al.
Continual Multiple Instance Learning with Enhanced Localization for Histopathological Whole Slide Image Analysis
Byung Hyun Lee, Wongi Jeong, Woojae Han et al.
Continuity and Isolation Lead to Doubts or Dilemmas in Large Language Models
Hector Pasten, Felipe Urrutia, Hector Orellana et al.
Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
Tuna Meral, Enis Simsar, Federico Tombari et al.
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng, Wenjie Wang, Fengbin Zhu et al.
CrossSpectra: Exploiting Cross-Layer Smoothness for Parameter-Efficient Fine-Tuning
Yifei Zhang, Hao Zhu, Junhao Dong et al.
DAMM-Diffusion: Learning Divergence-Aware Multi-Modal Diffusion Model for Nanoparticles Distribution Prediction
Junjie Zhou, Shouju Wang, Yuxia Tang et al.
Defining and Discovering Hyper-meta-paths for Heterogeneous Hypergraphs
Yaming Yang, Ziyu Zheng, Weigang Lu et al.
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
Jeffrey Willette, Heejun Lee, Sung Ju Hwang
Dependency Parsing is More Parameter-Efficient with Normalization
Paolo Gajo, Domenic Rosati, Hassan Sajjad et al.
Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention
Kyungmin Jo, Jooyeol Yun, Jaegul Choo