"attention mechanism" Papers
385 papers found • Page 3 of 8
Conference
Graph-Based Attention for Differentiable MaxSAT Solving
Sota Moriyama, Katsumi Inoue
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
Yeji Song, Jimyeong Kim, Wonhark Park et al.
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh, Sara Abdali, Yinheng Li et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
HSI: A Holistic Style Injector for Arbitrary Style Transfer
Shuhao Zhang, Hui Kang, Yang Liu et al.
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu, Yuan Zhang, Yiming Dong et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
Intra and Inter Parser-Prompted Transformers for Effective Image Restoration
Cong Wang, Jinshan Pan, Liyan Wang et al.
JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics
Yuanchuan Guo, Jun Liu, Huimin Cheng et al.
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon, Loïck Chambon, Louis Serrano et al.
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model
Qihao Duan, Bingding Huang, Zhenqiao Song et al.
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.
Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning
Yiju Guo, Wenkai Yang, Zexu Sun et al.
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Rui Dai, Sile Hu, Xu Shen et al.
Light3R-SfM: Towards Feed-forward Structure-from-Motion
Sven Elflein, Qunjie Zhou, Laura Leal-Taixe
Lightweight Contrastive Distilled Hashing for Online Cross-modal Retrieval
Jiaxing Li, Lin Jiang, Zeqi Ma et al.
Limitations of Normalization in Attention
Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova et al.
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
Yifan Pu, Jixuan Ying, Qixiu Li et al.
Long Context Tuning for Video Generation
Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.
Long-Sequence Recommendation Models Need Decoupled Embeddings
Ningya Feng, Junwei Pan, Jialong Wu et al.
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan et al.
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
MAESTRO: Masked Encoding Set Transformer with Self-Distillation
Matthew Lee, Jaesik Kim, Matei Ionita et al.
MambaIRv2: Attentive State Space Restoration
Hang Guo, Yong Guo, Yaohua Zha et al.
Mamba Modulation: On the Length Generalization of Mamba Models
Peng Lu, Jerry Huang, QIUHAO Zeng et al.
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu, Xinchao Wang
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
MeCeFO: Enhancing LLM Training Robustness via Fault-Tolerant Optimization
Rizhen Hu, Yutong He, Ran Yan et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules
Yueqi Zhang, Peiwen Yuan, Yiwei Li et al.
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras et al.
MoBA: Mixture of Block Attention for Long-Context LLMs
Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
Yanfeng Li, Ka-Hou Chan, Yue Sun et al.
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes
Feiyang Pan, Shenghe Zheng, Chunyan Yin et al.
MoFo: Empowering Long-term Time Series Forecasting with Periodic Pattern Modeling
Jiaming Ma, Binwu Wang, Qihe Huang et al.
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
Multi-Kernel Correlation-Attention Vision Transformer for Enhanced Contextual Understanding and Multi-Scale Integration
Hongkang Zhang, Shao-Lun Huang, Ercan KURUOGLU et al.
Multi-party Collaborative Attention Control for Image Customization
Han Yang, Chuanguang Yang, Qiuli Wang et al.
Multipole Attention for Efficient Long Context Reasoning
Coleman Hooper, Sebastian Zhao, Luca Manolache et al.
Multi-turn Consistent Image Editing
Zijun Zhou, Yingying Deng, Xiangyu He et al.
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia, Wenbo Gou, Haoye Dong
MVTrajecter: Multi-View Pedestrian Tracking with Trajectory Motion Cost and Trajectory Appearance Cost
Taiga Yamane, Ryo Masumura, Satoshi Suzuki et al.