Poster "attention mechanism" Papers
292 papers found • Page 2 of 6
Conference
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
Jin Li, Zezhong Ding, Xike Xie
DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference
Chong Wu, Jiawang Cao, Renjie Xu et al.
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu, Jiahao Wang, Hao chen et al.
Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
Xingyu Chen, Yue Chen, Yuliang Xiu et al.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Rang Meng, Xingyu Zhang, Yuming Li et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information
Yuke Zhu, Yue Zhang, Dongdong Liu et al.
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
Enhancing Transformers Through Conditioned Embedded Tokens
Hemanth Saratchandran, Simon Lucey
Entropy Rectifying Guidance for Diffusion and Flow Models
Tariq Berrada Ifriqi, Adriana Romero-Soriano, Michal Drozdzal et al.
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
Jingcheng Deng, Zihao Wei, Liang Pang et al.
Exact Expressive Power of Transformers with Padding
Will Merrill, Ashish Sabharwal
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai, Jianqiao Lu, Yao Luo et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network
Shuo Xu, Yu Chen, Shuxia Lin et al.
From Attention to Activation: Unraveling the Enigmas of Large Language Models
Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.
From Softmax to Score: Transformers Can Effectively Implement In-Context Denoising Steps
Paul Rosu, Lawrence Carin, Xiang Cheng
Fully-inductive Node Classification on Arbitrary Graphs
Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.
Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression
Juan Chen, Honglin liu, Yingying Ao et al.
Global Regulation and Excitation via Attention Tuning for Stereo Matching
Jiahao LI, Xinhong Chen, Zhengmin JIANG et al.
Graph-Based Attention for Differentiable MaxSAT Solving
Sota Moriyama, Katsumi Inoue
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh, Sara Abdali, Yinheng Li et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
HoliTracer: Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
Yu Wang, Bo Dang, Wanchun Li et al.
HSI: A Holistic Style Injector for Arbitrary Style Transfer
Shuhao Zhang, Hui Kang, Yang Liu et al.
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu, Yuan Zhang, Yiming Dong et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
JADE: Joint Alignment and Deep Embedding for Multi-Slice Spatial Transcriptomics
Yuanchuan Guo, Jun Liu, Huimin Cheng et al.
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon, Loïck Chambon, Louis Serrano et al.
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model
Qihao Duan, Bingding Huang, Zhenqiao Song et al.
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.
Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning
Yiju Guo, Wenkai Yang, Zexu Sun et al.
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
Leveraging Submodule Linearity Enhances Task Arithmetic Performance in LLMs
Rui Dai, Sile Hu, Xu Shen et al.
Limitations of Normalization in Attention
Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova et al.
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
Yifan Pu, Jixuan Ying, Qixiu Li et al.
Long Context Tuning for Video Generation
Yuwei Guo, Ceyuan Yang, Ziyan Yang et al.