"multi-modal learning" Papers

20 papers found

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Saeed Amizadeh, Sara Abdali, Yinheng Li et al.

NeurIPS 2025posterarXiv:2509.15448

Incomplete Multi-view Deep Clustering with Data Imputation and Alignment

Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.

NeurIPS 2025poster
8
citations

Learning Diagrams: A Graphical Language for Compositional Training Regimes

Mason Lary, Richard Samuelson, Alexander Wilentz et al.

ICLR 2025poster

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025posterarXiv:2410.17637
19
citations

Multi-modal Learning: A Look Back and the Road Ahead

Divyam Madaan, Sumit Chopra, Kyunghyun Cho

ICLR 2025poster

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188
24
citations

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing

Yingying Zhang, Lixiang Ru, Kang Wu et al.

ICCV 2025posterarXiv:2507.13812
7
citations

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.

CVPR 2025posterarXiv:2503.18933

Understanding Contrastive Learning via Gaussian Mixture Models

Parikshit Bansal, Ali Kavis, Sujay Sanghavi

NeurIPS 2025poster
3
citations

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen et al.

AAAI 2024paperarXiv:2307.01146

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.

ECCV 2024posterarXiv:2403.18274
36
citations

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305
86
citations

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024paperarXiv:2312.08212
28
citations

MESED: A Multi-Modal Entity Set Expansion Dataset with Fine-Grained Semantic Classes and Hard Negative Entities

Li Yangning, Tingwei Lu, Hai-Tao Zheng et al.

AAAI 2024paperarXiv:2307.14878

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

HaiTao Yu, Mofei Song

AAAI 2024paperarXiv:2402.10002
18
citations

Mono3DVG: 3D Visual Grounding in Monocular Images

Yangfan Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024paperarXiv:2312.08022
35
citations

Multi-Label Supervised Contrastive Learning

Pingyue Zhang, Mengyue Wu

AAAI 2024paperarXiv:2410.13439
1
citations

ReconBoost: Boosting Can Achieve Modality Reconcilement

Cong Hua, Qianqian Xu, Shilong Bao et al.

ICML 2024posterarXiv:2405.09321

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Shikai Qiu, Boran Han, Danielle Robinson et al.

ICML 2024posterarXiv:2406.07337