"multi-modal learning" Papers
20 papers found
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh, Sara Abdali, Yinheng Li et al.
Incomplete Multi-view Deep Clustering with Data Imputation and Alignment
Jiyuan Liu, Xinwang Liu, Xinhang Wan et al.
Learning Diagrams: A Graphical Language for Compositional Training Regimes
Mason Lary, Richard Samuelson, Alexander Wilentz et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Multi-modal Learning: A Look Back and the Road Ahead
Divyam Madaan, Sumit Chopra, Kyunghyun Cho
Rethinking Vision-Language Model in Face Forensics: Multi-Modal Interpretable Forged Face Detector
Xiao Guo, Xiufeng Song, Yue Zhang et al.
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu et al.
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.
Understanding Contrastive Learning via Gaussian Mixture Models
Parikshit Bansal, Ali Kavis, Sujay Sanghavi
AVSegFormer: Audio-Visual Segmentation with Transformer
Shengyi Gao, Zhe Chen, Guo Chen et al.
COMMA: Co-articulated Multi-Modal Learning
Authors: Lianyu Hu, Liqing Gao, Zekang Liu et al.
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
LAMM: Label Alignment for Multi-Modal Prompt Learning
Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.
MESED: A Multi-Modal Entity Set Expansion Dataset with Fine-Grained Semantic Classes and Hard Negative Entities
Li Yangning, Tingwei Lu, Hai-Tao Zheng et al.
MM-Point: Multi-View Information-Enhanced Multi-Modal Self-Supervised 3D Point Cloud Understanding
HaiTao Yu, Mofei Song
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
Multi-Label Supervised Contrastive Learning
Pingyue Zhang, Mengyue Wu
ReconBoost: Boosting Can Achieve Modality Reconcilement
Cong Hua, Qianqian Xu, Shilong Bao et al.
Transferring Knowledge From Large Foundation Models to Small Downstream Models
Shikai Qiu, Boran Han, Danielle Robinson et al.