Poster "multimodal fusion" Papers

17 papers found

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025posterarXiv:2412.00833
17
citations

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan, Feng Wu, Yihang Chen et al.

NeurIPS 2025posterarXiv:2510.20736

A Multimodal BiMamba Network with Test-Time Adaptation for Emotion Recognition Based on Physiological Signals

Ziyu Jia, Tingyu Du, Zhengyu Tian et al.

NeurIPS 2025poster

Can We Talk Models Into Seeing the World Differently?

Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.

ICLR 2025posterarXiv:2403.09193
15
citations

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

ZeMing Gong, Austin Wang, Xiaoliang Huo et al.

ICLR 2025posterarXiv:2405.17537
18
citations

CyIN: Cyclic Informative Latent Space for Bridging Complete and Incomplete Multimodal Learning

Ronghao Lin, Qiaolin He, Sijie Mai et al.

NeurIPS 2025poster

Leveraging Driver Field-of-View for Multimodal Ego-Trajectory Prediction

M. Eren Akbiyik, Nedko Savov, Danda Pani Paudel et al.

ICLR 2025posterarXiv:2312.08558
3
citations

Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning

Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.

NeurIPS 2025posterarXiv:2510.24919

Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation

Zhaochong An, Guolei Sun, Yun Liu et al.

ICLR 2025posterarXiv:2410.22489
22
citations

Multimodal Lego: Model Merging and Fine-Tuning Across Topologies and Modalities in Biomedicine

Konstantin Hemker, Nikola Simidjievski, Mateja Jamnik

ICLR 2025posterarXiv:2405.19950
2
citations

Multimodal LiDAR-Camera Novel View Synthesis with Unified Pose-free Neural Fields

Weiyi Xue, Fan Lu, Yunwei Zhu et al.

NeurIPS 2025poster

Reading Recognition in the Wild

Charig Yang, Samiul Alam, Shakhrul Iman Siam et al.

NeurIPS 2025posterarXiv:2505.24848
2
citations

SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction

ZaiPeng Duan, Xuzhong Hu, Pei An et al.

CVPR 2025posterarXiv:2507.17083
5
citations

SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding

Rong Li, Shijie Li, Lingdong Kong et al.

CVPR 2025posterarXiv:2412.04383
40
citations

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

Ding Jia, Jianyuan Guo, Kai Han et al.

ICML 2024poster

Multimodal Prototyping for cancer survival prediction

Andrew Song, Richard Chen, Guillaume Jaume et al.

ICML 2024poster

Predictive Dynamic Fusion

Bing Cao, Yinan Xia, Yi Ding et al.

ICML 2024poster