2025 Poster "multimodal learning" Papers
13 papers found
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.
ICLR 2025posterarXiv:2407.18134
12
citations
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang, Yake Wei, Zequn Yang et al.
CVPR 2025posterarXiv:2503.18595
8
citations
Ask and Remember: A Questions-Only Replay Strategy for Continual Visual Question Answering
Imad Eddine MAROUF, Enzo Tartaglione, Stéphane Lathuilière et al.
ICCV 2025posterarXiv:2502.04469
1
citations
Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation
xin zhang, Ziruo Zhang, JIAWEI DU et al.
NeurIPS 2025posterarXiv:2505.14705
3
citations
Can Text-to-Video Generation help Video-Language Alignment?
Luca Zanella, Massimiliano Mancini, Willi Menapace et al.
CVPR 2025posterarXiv:2503.18507
1
citations
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei, Hang Wang, Bingbing Ni
CVPR 2025posterarXiv:2505.11216
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei, Chunbo Luo, Yang Luo
ICCV 2025posterarXiv:2507.10203
4
citations
Learning Diffusion Models with Flexible Representation Guidance
Chenyu Wang, Cai Zhou, Sharut Gupta et al.
NeurIPS 2025posterarXiv:2507.08980
5
citations
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.
NeurIPS 2025posterarXiv:2510.24919
ModuLM: Enabling Modular and Multimodal Molecular Relational Learning with Large Language Models
Zhuo Chen, YIZHEN ZHENG, Huan Yee Koh et al.
NeurIPS 2025posterarXiv:2506.00880
1
citations
NatureLM-audio: an Audio-Language Foundation Model for Bioacoustics
David Robinson, Marius Miron, Masato Hagiwara et al.
ICLR 2025posterarXiv:2411.07186
23
citations
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
Jiaben Chen, Zixin Wang, AILING ZENG et al.
NeurIPS 2025posterarXiv:2510.07249
3
citations
Vision‑Language‑Vision Auto‑Encoder: Scalable Knowledge Distillation from Diffusion Models
Tiezheng Zhang, Yitong Li, Yu-Cheng Chou et al.
NeurIPS 2025posterarXiv:2507.07104
2
citations