"multimodal learning" Papers
19 papers found
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella, Massimiliano Mancini, Willi Menapace et al.
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei, Chunbo Luo, Yang Luo
Learning Diffusion Models with Flexible Representation Guidance
Chenyu Wang, Cai Zhou, Sharut Gupta et al.
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.
ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models
Zhuo Chen, YIZHEN ZHENG, Huan Yee Koh et al.
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara et al.
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
Jiaben Chen, Zixin Wang, AILING ZENG et al.
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
Contrasting Multiple Representations with the Multi-Marginal Matching Gap
Zoe Piran, Michal Klein, James Thornton et al.
Enhancing Storage and Computational Efficiency in Federated Multimodal Learning for Large-Scale Models
Zixin Zhang, Fan Qi, Changsheng Xu
Gradient-Guided Modality Decoupling for Missing-Modality Robustness
IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation
Kai Li, Runxuan Yang, Fuchun Sun et al.
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
MULTISCRIPT: Multimodal Script Learning for Supporting Open Domain Everyday Tasks
Jingyuan Qi, Minqian Liu, Ying Shen et al.
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Yang Chen, Cong Fang, Zhouchen Lin et al.
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
che liu, Zhongwei Wan, Cheng Ouyang et al.