2024 "multimodal alignment" Papers
10 papers found
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu, Gaurav Datta, Huang Huang et al.
ICML 2024poster
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video
Zhaobo Qi, Yibo Yuan, Xiaowen Ruan et al.
AAAI 2024paperarXiv:2401.07567
11
citations
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.
ECCV 2024posterarXiv:2407.09781
11
citations
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens
ECCV 2024posterarXiv:2403.09377
4
citations
PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation
Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno et al.
ECCV 2024posterarXiv:2409.06535
7
citations
STELLA: Continual Audio-Video Pre-training with SpatioTemporal Localized Alignment
Jaewoo Lee, Jaehong Yoon, Wonjae Kim et al.
ICML 2024oral
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma, Furong Xu, Jian liu et al.
ICML 2024poster
TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training
Chaoya Jiang, Wei Ye, Haiyang Xu et al.
AAAI 2024paperarXiv:2312.08846
6
citations
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Chengen Lai, Shengli Song, Shiqi Meng et al.
AAAI 2024paperarXiv:2312.13594
9
citations
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
Kun Su, Judith Li, Qingqing Huang et al.
AAAI 2024paperarXiv:2305.06594
23
citations