2024 "cross-modal alignment" Papers
16 papers found
Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models
Jie ZHANG, Xiaosong Ma, Song Guo et al.
ICML 2024poster
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo, Pedro Morgado
ECCV 2024posterarXiv:2407.13095
7
citations
Augmented Commonsense Knowledge for Remote Object Grounding
Bahram Mohammadi, Yicong Hong, Yuankai Qi et al.
AAAI 2024paperarXiv:2406.01256
CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
ECCV 2024posterarXiv:2408.14930
12
citations
Detection-Based Intermediate Supervision for Visual Question Answering
Yuhang Liu, Daowan Peng, Wei Wei et al.
AAAI 2024paperarXiv:2312.16012
3
citations
Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning
Zhiyue Liu, Jinyuan Liu, Fanrong Ma
AAAI 2024paperarXiv:2312.08865
20
citations
Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
Seungwan Jin, Hoyoung Choi, Taehyung Noh et al.
ECCV 2024poster
1
citations
Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment
Yuxiao Chen, Kai Li, Wentao Bao et al.
ECCV 2024posterarXiv:2409.16145
5
citations
Multi-Level Cross-Modal Alignment for Image Clustering
Liping Qiu, Qin Zhang, Xiaojun Chen et al.
AAAI 2024paperarXiv:2401.11740
6
citations
Multi-Prompts Learning with Cross-Modal Alignment for Attribute-Based Person Re-identification
Yajing Zhai, Yawen Zeng, Zhiyong Huang et al.
AAAI 2024paperarXiv:2312.16797
33
citations
Position: The Platonic Representation Hypothesis
Minyoung Huh, Brian Cheung, Tongzhou Wang et al.
ICML 2024poster
Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
Yuanpeng Tu, Boshen Zhang, Liang Liu et al.
ECCV 2024posterarXiv:2401.03145
24
citations
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
AAAI 2024paperarXiv:2312.12155
40
citations
Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning
Kaibin Tian, Yanhua Cheng, Yi Liu et al.
AAAI 2024paperarXiv:2401.00701
14
citations
Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training
Jinxia Yang, Bing Su, Xin Zhao et al.
ICML 2024oralarXiv:2405.19654
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li, Haopeng Li, Sarah Erfani et al.
ICML 2024posterarXiv:2406.02915