"attention mechanism" Papers
390 papers found • Page 2 of 8
Conference
Dependency Parsing is More Parameter-Efficient with Normalization
Paolo Gajo, Domenic Rosati, Hassan Sajjad et al.
Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention
Kyungmin Jo, Jooyeol Yun, Jaegul Choo
Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration
Shihao Zhou, Dayu Li, Jinshan Pan et al.
DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling
Yuang Ai, Qihang Fan, Xuefeng Hu et al.
Differential Transformer
Tianzhu Ye, Li Dong, Yuqing Xia et al.
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang, Jesse Hoogland, Stan van Wingerden et al.
DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
Yiren Song, Xiaokang Liu, Mike Zheng Shou
DIFFSSR: Stereo Image Super-resolution Using Differential Transformer
Dafeng Zhang
Diffusion-Based Imaginative Coordination for Bimanual Manipulation
Huilin Xu, Jian Ding, Jiakun Xu et al.
Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data
Hengyu Fu, Zehao Dou, Jiawei Guo et al.
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo, Shuai Lu, Weihang Zhang et al.
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim, Dayun Ju, Woojung Han et al.
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Lei Chen, Joan Bruna, Alberto Bietti
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving
Xiaosong Jia, Junqi You, Zhiyuan Zhang et al.
Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection
Hongsong Wang, Andi Xu, Pinle Ding et al.
DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion
Jin Li, Zezhong Ding, Xike Xie
DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference
Chong Wu, Jiawang Cao, Renjie Xu et al.
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu, Jiahao Wang, Hao chen et al.
Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
Xingyu Chen, Yue Chen, Yuliang Xiu et al.
EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Rang Meng, Xingyu Zhang, Yuming Li et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information
Yuke Zhu, Yue Zhang, Dongdong Liu et al.
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
Enhancing Masked Time-Series Modeling via Dropping Patches
Tianyu Qiu, Yi Xie, Hao Niu et al.
Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation
Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan et al.
Enhancing Training Data Attribution with Representational Optimization
Weiwei Sun, Haokun Liu, Nikhil Kandpal et al.
Enhancing Transformers Through Conditioned Embedded Tokens
Hemanth Saratchandran, Simon Lucey
Entropy Rectifying Guidance for Diffusion and Flow Models
Tariq Berrada Ifriqi, Adriana Romero-Soriano, Michal Drozdzal et al.
Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models
Jingcheng Deng, Zihao Wei, Liang Pang et al.
Exact Expressive Power of Transformers with Padding
Will Merrill, Ashish Sabharwal
Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.
FFN Fusion: Rethinking Sequential Computation in Large Language Models
Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.
First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training
Gyudong Kim, Hyukju Na, Jin Kim et al.
FLAME: Fast Long-context Adaptive Memory for Event-based Vision
Biswadeep Chakraborty, Saibal Mukhopadhyay
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai, Jianqiao Lu, Yao Luo et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network
Shuo Xu, Yu Chen, Shuxia Lin et al.
From Attention to Activation: Unraveling the Enigmas of Large Language Models
Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.
From Softmax to Score: Transformers Can Effectively Implement In-Context Denoising Steps
Paul Rosu, Lawrence Carin, Xiang Cheng
Fully-inductive Node Classification on Arbitrary Graphs
Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.
Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors
Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski
Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.
GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
Jiawei Lu, YingPeng Zhang, Zengjun Zhao et al.
Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression
Juan Chen, Honglin liu, Yingying Ao et al.