2025 Poster "cross-modal alignment" Papers

28 papers found

Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Junming Liu, Siyuan Meng, Yanting Gao et al.

ICCV 2025posterarXiv:2503.12972
13
citations

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025posterarXiv:2412.00833
17
citations

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.

NEURIPS 2025posterarXiv:2502.01341

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan, Feng Wu, Yihang Chen et al.

NEURIPS 2025posterarXiv:2510.20736

Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation

xin zhang, Ziruo Zhang, JIAWEI DU et al.

NEURIPS 2025posterarXiv:2505.14705
3
citations

Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning

Tianjiao Jiang, Zhen Zhang, Yuhang Liu et al.

ICCV 2025posterarXiv:2508.03102
1
citations

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

Edson Araujo, Andrew Rouditchenko, Yuan Gong et al.

CVPR 2025posterarXiv:2505.01237
2
citations

CF-VLM:CounterFactual Vision-Language Fine-tuning

jusheng zhang, Kaitong Cai, Yijia Fan et al.

NEURIPS 2025poster

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.

ICLR 2025posterarXiv:2501.16629
19
citations

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

ICLR 2025posterarXiv:2505.04965
8
citations

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

CVPR 2025posterarXiv:2503.18817
4
citations

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Xiangru Zhu, Penglei Sun, Yaoxian Song et al.

ICLR 2025posterarXiv:2410.10291
2
citations

Harnessing Text-to-Image Diffusion Models for Point Cloud Self-Supervised Learning

Yiyang Chen, Shanshan Zhao, Lunhao Duan et al.

ICCV 2025posterarXiv:2507.09102

Hierarchical Cross-modal Prompt Learning for Vision-Language Models

Hao Zheng, Shunzhi Yang, Zhuoxin He et al.

ICCV 2025posterarXiv:2507.14976
5
citations

It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data

Dominik Schnaus, Nikita Araslanov, Daniel Cremers

CVPR 2025posterarXiv:2503.24129
6
citations

Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval

Yue Wu, Zhaobo Qi, Yiling Wu et al.

ICLR 2025poster
7
citations

Learning Source-Free Domain Adaptation for Visible-Infrared Person Re-Identification

Yongxiang Li, Yanglin Feng, Yuan Sun et al.

NEURIPS 2025poster

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025poster
14
citations

Multimodal Prompt Alignment for Facial Expression Recognition

Fuyan Ma, Yiran He, Bin Sun et al.

ICCV 2025posterarXiv:2506.21017
2
citations

Preacher: Paper-to-Video Agentic System

Jingwei Liu, Ling Yang, Hao Luo et al.

ICCV 2025posterarXiv:2508.09632
2
citations

ReCon: Enhancing True Correspondence Discrimination through Relation Consistency for Robust Noisy Correspondence Learning

Quanxing Zha, Xin Liu, Shu-Juan Peng et al.

CVPR 2025posterarXiv:2502.19962
1
citations

Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding

Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.

NEURIPS 2025poster

Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers

Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.

NEURIPS 2025poster
6
citations

Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency

Kai Gan, Bo Ye, Min-Ling Zhang et al.

ICLR 2025poster
3
citations

SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval

Jiahang Zhang, Lilang Lin, Shuai Yang et al.

NEURIPS 2025poster

TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models

Ziyang Luo, Nian Liu, Xuguang Yang et al.

ICCV 2025posterarXiv:2506.11436
2
citations

The Indra Representation Hypothesis

Jianglin Lu, Hailing Wang, Kuo Yang et al.

NEURIPS 2025poster

When Kernels Multiply, Clusters Unify: Fusing Embeddings with the Kronecker Product

Youqi WU, Jingwei Zhang, Farzan Farnia

NEURIPS 2025posterarXiv:2506.08645
2
citations