Poster "cross-attention mechanism" Papers
18 papers found
A Conditional Probability Framework for Compositional Zero-shot Learning
Peng Wu, Qiuxia Lai, Hao Fang et al.
Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning
Tianjiao Jiang, Zhen Zhang, Yuhang Liu et al.
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.
DiffE2E: Rethinking End-to-End Driving with a Hybrid Diffusion-Regression-Classification Policy
Rui Zhao, Yuze Fan, Ziguo Chen et al.
InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
Chenyang Zhu, Kai Li, Yue Ma et al.
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou, Gim Hee Lee
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Chest X-ray with Zero-Shot Multi-Task Capability
Jonggwon Park, Byungmu Yoon, Soobum Kim et al.
Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts
Chen Li, Huiying Xu, Changxin Gao et al.
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh, SeungHoo Hong, Tae-Young Kim et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.
Image Fusion via Vision-Language Model
Zixiang Zhao, Lilun Deng, Haowen Bai et al.
Meta Evidential Transformer for Few-Shot Open-Set Recognition
Hitesh Sapkota, Krishna Neupane, Qi Yu
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
Yoonwoo Jeong, Jinwoo Lee, Chiheon Kim et al.
Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
Jing Li, Junsong Fan, Zhaoxiang Zhang
SemReg: Semantics Constrained Point Cloud Registration
Sheldon Fung, Xuequan Lu, Dasith de Silva Edirimuni et al.
Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
Luozhou Wang, Guibao Shen, Wenhang Ge et al.