Poster "multimodal alignment" Papers

11 papers found

Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Hua Ye, Hang Ding, Siyuan Chen et al.

NeurIPS 2025posterarXiv:2511.08399

AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward

Haonan Han, Xiangzuo Wu, Huan Liao et al.

CVPR 2025posterarXiv:2411.18654
5
citations

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen, Yunhao Gou, Runhui Huang et al.

CVPR 2025posterarXiv:2409.18042
44
citations

Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment

Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.

CVPR 2025posterarXiv:2409.19425
2
citations

OpenOmni: Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-time Emotional Speech Synthesis

Run Luo, Ting-En Lin, Haonan Zhang et al.

NeurIPS 2025poster

Self-Supervised Spatial Correspondence Across Modalities

Ayush Shrivastava, Andrew Owens

CVPR 2025posterarXiv:2506.03148
2
citations

A Touch, Vision, and Language Dataset for Multimodal Alignment

Letian Fu, Gaurav Datta, Huang Huang et al.

ICML 2024poster

Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding

Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.

ECCV 2024posterarXiv:2407.09781
11
citations

Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks

Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

ECCV 2024posterarXiv:2403.09377
4
citations

PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation

Ginger Delmas, Philippe Weinzaepfel, Francesc Moreno et al.

ECCV 2024posterarXiv:2409.06535
7
citations

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

Ziping Ma, Furong Xu, Jian liu et al.

ICML 2024poster