"multi-modal learning" Papers

20 papers found

Conference

AAAI 2025 (3,028)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NeurIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,140)oral (1,594)spotlight (1,421)highlight (975)

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Saeed Amizadeh, Sara Abdali, Yinheng Li et al.

NeurIPS 2025posterarXiv:2509.15448

Incomplete Multi-view Deep Clustering with Data Imputation and Alignment

Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.

NeurIPS 2025poster

citations

Learning Diagrams: A Graphical Language for Compositional Training Regimes

Mason Lary, Richard Samuelson, Alexander Wilentz et al.

ICLR 2025poster

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.

ICLR 2025posterarXiv:2410.17637

citations

Multi-modal Learning: A Look Back and the Road Ahead

Divyam Madaan, Sumit Chopra, Kyunghyun Cho

ICLR 2025poster

Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector

Xiao Guo, Xiufeng Song, Yue Zhang et al.

CVPR 2025posterarXiv:2503.20188

citations

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing

Yingying Zhang, Lixiang Ru, Kang Wu et al.

ICCV 2025posterarXiv:2507.13812

citations

SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction

Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.

CVPR 2025posterarXiv:2503.18933

Understanding Contrastive Learning via Gaussian Mixture Models

Parikshit Bansal, Ali Kavis, Sujay Sanghavi

NeurIPS 2025poster

citations

AVSegFormer: Audio-Visual Segmentation with Transformer

Shengyi Gao, Zhe Chen, Guo Chen et al.

AAAI 2024paperarXiv:2307.01146

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.

AAAI 2024paperarXiv:2401.00268

DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment

Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.

ECCV 2024posterarXiv:2403.18274

citations

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305

citations

LAMM: Label Alignment for Multi-Modal Prompt Learning

Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.

AAAI 2024paperarXiv:2312.08212

citations

MESED: A Multi-Modal Entity Set Expansion Dataset with Fine-Grained Semantic Classes and Hard Negative Entities

Li Yangning, Tingwei Lu, Hai-Tao Zheng et al.

AAAI 2024paperarXiv:2307.14878

MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding

HaiTao Yu, Mofei Song

AAAI 2024paperarXiv:2402.10002

citations

Mono3DVG: 3D Visual Grounding in Monocular Images

Yangfan Zhan, Yuan Yuan, Zhitong Xiong

AAAI 2024paperarXiv:2312.08022

citations

Multi-Label Supervised Contrastive Learning

Pingyue Zhang, Mengyue Wu

AAAI 2024paperarXiv:2410.13439

citations

ReconBoost: Boosting Can Achieve Modality Reconcilement

Cong Hua, Qianqian Xu, Shilong Bao et al.

ICML 2024posterarXiv:2405.09321

Transferring Knowledge From Large Foundation Models to Small Downstream Models

Shikai Qiu, Boran Han, Danielle Robinson et al.

ICML 2024posterarXiv:2406.07337