Poster "attention mechanism" Papers

292 papers found • Page 5 of 6

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance

Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.

ECCV 2024arXiv:2403.14781
239
citations

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona et al.

ICML 2024arXiv:2311.12997
15
citations

ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention

Jiawei Wang, Changjian Li

CVPR 2024arXiv:2311.16682
13
citations

CountFormer: Multi-View Crowd Counting Transformer

Hong Mo, Xiong Zhang, Jianchao Tan et al.

ECCV 2024arXiv:2407.02047
9
citations

CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

Feng Lu, Xiangyuan Lan, Lijun Zhang et al.

CVPR 2024arXiv:2402.19231
80
citations

DeiT-LT: Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets

Harsh Rangwani, Pradipto Mondal, Mayank Mishra et al.

CVPR 2024arXiv:2404.02900
18
citations

Delving into Differentially Private Transformer

Youlong Ding, Xueyang Wu, Yining meng et al.

ICML 2024arXiv:2405.18194
11
citations

DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification

Wenhui Zhu, Xiwen Chen, Peijie Qiu et al.

ECCV 2024arXiv:2407.03575
28
citations

Do text-free diffusion models learn discriminative visual representations?

Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi et al.

ECCV 2024arXiv:2311.17921
27
citations

DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization

Jisu Nam, Heesu Kim, DongJae Lee et al.

CVPR 2024arXiv:2402.09812
63
citations

DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion

Liao Shen, Tianqi Liu, Huiqiang Sun et al.

ECCV 2024arXiv:2409.09605
13
citations

EcoMatcher: Efficient Clustering Oriented Matcher for Detector-free Image Matching

Peiqi Chen, Lei Yu, Yi Wan et al.

ECCV 2024
4
citations

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Yang Zhang, Tze Tzun Teoh, Wei Hern Lim et al.

ECCV 2024arXiv:2403.06381
19
citations

EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

Jongsuk Kim, Hyeongkeun Lee, Kyeongha Rho et al.

ICML 2024arXiv:2403.09502
12
citations

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.

ICML 2024arXiv:2310.12956
11
citations

Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation

Yuwen Pan, Rui Sun, Naisong Luo et al.

ECCV 2024arXiv:2408.13838
5
citations

Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction

Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen et al.

CVPR 2024arXiv:2406.17219
9
citations

FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance

Jiedong Zhuang, Jiaqi Hu, Lianrui Mu et al.

ECCV 2024arXiv:2407.05578
8
citations

Free-Editor: Zero-shot Text-driven 3D Scene Editing

Md Nazmul Karim, Hasan Iqbal, Umar Khalid et al.

ECCV 2024arXiv:2312.13663
14
citations

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.

ECCV 2024arXiv:2405.17429
99
citations

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

Zhenqiao Song, Yunlong Zhao, Wenxian Shi et al.

ICML 2024arXiv:2405.08205
10
citations

Graph External Attention Enhanced Transformer

Jianqing Liang, Min Chen, Jiye Liang

ICML 2024arXiv:2405.21061
9
citations

Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning

Pengyu Li, Biao Wang, Tianchu Guo et al.

ECCV 2024

Grounded Text-to-Image Synthesis with Attention Refocusing

Quynh Phung, Songwei Ge, Jia-Bin Huang

CVPR 2024arXiv:2306.05427
162
citations

HandDAGT: A Denoising Adaptive Graph Transformer for 3D Hand Pose Estimation

WENCAN CHENG, Eun-Ji Kim, Jong Hwan Ko

ECCV 2024arXiv:2407.20542
3
citations

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

Zhongyu Xia, ZhiWei Lin, Xinhao Wang et al.

ECCV 2024arXiv:2404.02517
19
citations

High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion

Yu Dai, Junchen Shen, Zijie Zhai et al.

ICML 2024

How Smooth Is Attention?

Valérie Castin, Pierre Ablin, Gabriel Peyré

ICML 2024arXiv:2312.14820
29
citations

How Transformers Learn Causal Structure with Gradient Descent

Eshaan Nichani, Alex Damian, Jason Lee

ICML 2024arXiv:2402.14735
102
citations

HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention

Xiaolong Tang, Meina Kan, Shiguang Shan et al.

CVPR 2024arXiv:2404.06351
63
citations

In-context Convergence of Transformers

Yu Huang, Yuan Cheng, Yingbin LIANG

ICML 2024arXiv:2310.05249
106
citations

In-Context Language Learning: Architectures and Algorithms

Ekin Akyürek, Bailin Wang, Yoon Kim et al.

ICML 2024arXiv:2401.12973
83
citations

In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

Shiqi Chen, Miao Xiong, Junteng Liu et al.

ICML 2024arXiv:2403.01548
43
citations

InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization

Zhengyang Hu, Song Kang, Qunsong Zeng et al.

ICML 2024arXiv:2402.10158
6
citations

InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping

Zhenhua Xu, Kwan-Yee K. Wong, Hengshuang ZHAO

ECCV 2024arXiv:2308.08543
18
citations

I/O Complexity of Attention, or How Optimal is FlashAttention?

Barna Saha, Christopher Ye

ICML 2024

Iterative Search Attribution for Deep Neural Networks

Zhiyu Zhu, Huaming Chen, Xinyi Wang et al.

ICML 2024

KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning

Junnan Liu, Qianren Mao, Weifeng Jiang et al.

ICML 2024arXiv:2409.12865
5
citations

Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation

Haoyu Ji, Bowen Chen, Xinglong Xu et al.

ECCV 2024

Language Model Guided Interpretable Video Action Reasoning

Ning Wang, Guangming Zhu, Hongsheng Li et al.

CVPR 2024arXiv:2404.01591
7
citations

Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.

ECCV 2024arXiv:2404.01284
63
citations

LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions

Victor Agostinelli III, Sanghyun Hong, Lizhong Chen

ICML 2024

Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

Zhentao Tan, Yadong Mu

ICML 2024arXiv:2406.09899
4
citations

Learning with Unmasked Tokens Drives Stronger Vision Learners

Taekyung Kim, Sanghyuk Chun, Byeongho Heo et al.

ECCV 2024arXiv:2310.13593
3
citations

MagicEraser: Erasing Any Objects via Semantics-Aware Control

FAN LI, Zixiao Zhang, Yi Huang et al.

ECCV 2024arXiv:2410.10207
13
citations

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.

ECCV 2024arXiv:2409.00633
4
citations

Memory Efficient Neural Processes via Constant Memory Attention Block

Leo Feng, Frederick Tung, Hossein Hajimirsadeghi et al.

ICML 2024arXiv:2305.14567
8
citations

Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas

Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli et al.

ECCV 2024arXiv:2408.15660
6
citations

Meta Evidential Transformer for Few-Shot Open-Set Recognition

Hitesh Sapkota, Krishna Neupane, Qi Yu

ICML 2024

Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

Zhiyu Yao, Jian Wang, Haixu Wu et al.

ICML 2024