"multimodal learning" Papers

43 papers found

$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs

Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.

ICLR 2025arXiv:2407.18134
13
citations

Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition

Chengxiang Huang, Yake Wei, Zequn Yang et al.

CVPR 2025arXiv:2503.18595
8
citations

All in One: Visual-Description-Guided Unified Point Cloud Segmentation

Zongyan Han, Mohamed El Amine Boudjoghra, Jiahua Dong et al.

ICCV 2025arXiv:2507.05211
1
citations

Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering

Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.

ICCV 2025arXiv:2502.04469
1
citations

Balancing Multimodal Training Through Game-Theoretic Regularization

Konstantinos Kontras, Thomas Strypsteen, Christos Chatzichristos et al.

NEURIPS 2025spotlightarXiv:2411.07335
7
citations

Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation

xin zhang, Ziruo Zhang, JIAWEI DU et al.

NEURIPS 2025arXiv:2505.14705
3
citations

Can Text-to-Video Generation help Video-Language Alignment?

Luca Zanella, Massimiliano Mancini, Willi Menapace et al.

CVPR 2025arXiv:2503.18507
1
citations

From Pose to Muscle: Multimodal Learning for Piano Hand Muscle Electromyography

RUOFAN LIU, YICHEN PENG, Takanori Oku et al.

NEURIPS 2025

GeoMM: On Geodesic Perspective for Multi-modal Learning

Shibin Mei, Hang Wang, Bingbing Ni

CVPR 2025arXiv:2505.11216

Improving Multimodal Learning via Imbalanced Learning

Shicai Wei, Chunbo Luo, Yang Luo

ICCV 2025arXiv:2507.10203
5
citations

Learning Diffusion Models with Flexible Representation Guidance

Chenyu Wang, Cai Zhou, Sharut Gupta et al.

NEURIPS 2025arXiv:2507.08980
5
citations

Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers

Ruofei Wang, Hongzhan Lin, Ziyuan Luo et al.

AAAI 2025paperarXiv:2412.15503
3
citations

MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning

Seong-Hyeon Hwang, Soyoung Choi, Steven Whang

NEURIPS 2025arXiv:2509.25831

MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.

CVPR 2025arXiv:2504.02264
7
citations

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.

NEURIPS 2025arXiv:2510.24919

ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models

Zhuo Chen, YIZHEN ZHENG, Huan Yee Koh et al.

NEURIPS 2025arXiv:2506.00880
1
citations

Multimodal Autoregressive Pre-training of Large Vision Encoders

Enrico Fini, Mustafa Shukor, Xiujun Li et al.

CVPR 2025highlightarXiv:2411.14402
77
citations

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Christopher Liao, Christian So, Theodoros Tsiligkaridis et al.

ICLR 2025arXiv:2402.04416
1
citations

NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics

David Robinson, Marius Miron, Masato Hagiwara et al.

ICLR 2025arXiv:2411.07186
23
citations

OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning

Xianhang Li, Yanqing Liu, Haoqin Tu et al.

ICCV 2025arXiv:2505.04601
6
citations

SmartCLIP: Modular Vision-language Alignment with Identification Guarantees

Shaoan Xie, Lingjing Kong, Yujia Zheng et al.

CVPR 2025highlightarXiv:2507.22264
4
citations

Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues

Sihong Huang, Jiaxin Wu, Xiaoyong Wei et al.

CVPR 2025
2
citations

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation

Jiaben Chen, Zixin Wang, AILING ZENG et al.

NEURIPS 2025arXiv:2510.07249
3
citations

Towards General Visual-Linguistic Face Forgery Detection

Ke Sun, Shen Chen, Taiping Yao et al.

CVPR 2025arXiv:2307.16545
35
citations

Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization

Hao Dong, Eleni Chatzi, Olga Fink

ICLR 2025arXiv:2501.13924
6
citations

Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models

Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.

NEURIPS 2025arXiv:2507.07104
2
citations

Adversarial Prompt Tuning for Vision-Language Models

Jiaming Zhang, Xingjun Ma, Xin Wang et al.

ECCV 2024arXiv:2311.11261
34
citations

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang et al.

ECCV 2024arXiv:2311.16445
11
citations

Contrasting Multiple Representations with the Multi-Marginal Matching Gap

Zoe Piran, Michal Klein, James Thornton et al.

ICML 2024arXiv:2405.19532
9
citations

Diagnosing and Re-learning for Balanced Multimodal Learning

Yake Wei, Siwei Li, Ruoxuan Feng et al.

ECCV 2024arXiv:2407.09705
38
citations

DiaLoc: An Iterative Approach to Embodied Dialog Localization

Chao Zhang, Mohan Li, Ignas Budvytis et al.

CVPR 2024arXiv:2403.06846
5
citations

Enhancing Multimodal Cooperation via Sample-level Modality Valuation

Yake Wei, Ruoxuan Feng, Zihe Wang et al.

CVPR 2024arXiv:2309.06255
52
citations

Enhancing Storage and Computational Efficiency in Federated Multimodal Learning for Large-Scale Models

Zixin Zhang, Fan Qi, Changsheng Xu

ICML 2024

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

AAAI 2024paperarXiv:2402.16318
18
citations

IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

Kai Li, Runxuan Yang, Fuchun Sun et al.

ICML 2024oralarXiv:2308.08143
21
citations

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Yuxiao Chen, Kai Li, Wentao Bao et al.

ECCV 2024arXiv:2409.16145
7
citations

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Yake Wei, Di Hu

ICML 2024arXiv:2405.17730
64
citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024arXiv:2401.14405
12
citations

MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks

Jingyuan Qi, Minqian Liu, Ying Shen et al.

AAAI 2024paperarXiv:2310.04965
3
citations

Object-Oriented Anchoring and Modal Alignment in Multimodal Learning

Shibin Mei, Bingbing Ni, Hang Wang et al.

ECCV 2024
1
citations

Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

Yang Chen, Cong Fang, Zhouchen Lin et al.

ICML 2024arXiv:2406.11249
2
citations

Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

Kangning Yin, Shihao Zou, Yuxuan Ge et al.

CVPR 2024highlightarXiv:2403.00691
15
citations

Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

che liu, Zhongwei Wan, Cheng Ouyang et al.

ICML 2024arXiv:2403.06659
61
citations