2024 "multimodal alignment" Papers

10 papers found

A Touch, Vision, and Language Dataset for Multimodal Alignment

Letian Fu, Gaurav Datta, Huang Huang et al.

ICML 2024poster

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

Zhaobo Qi, Yibo Yuan, Xiaowen Ruan et al.

AAAI 2024paperarXiv:2401.07567
11
citations

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.

ECCV 2024posterarXiv:2407.09781
11
citations

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks

Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

ECCV 2024posterarXiv:2403.09377
4
citations

PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation

Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno et al.

ECCV 2024posterarXiv:2409.06535
7
citations

STELLA: Continual Audio-Video Pre-training with SpatioTemporal Localized Alignment

Jaewoo Lee, Jaehong Yoon, Wonjae Kim et al.

ICML 2024oral

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024poster

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training

Chaoya Jiang, Wei Ye, Haiyang Xu et al.

AAAI 2024paperarXiv:2312.08846
6
citations

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Chengen Lai, Shengli Song, Shiqi Meng et al.

AAAI 2024paperarXiv:2312.13594
9
citations

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Kun Su, Judith Li, Qingqing Huang et al.

AAAI 2024paperarXiv:2305.06594
23
citations