Poster "cross-modal alignment" Papers

20 papers found

AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment

Yan Li, Yifei Xing, Xiangyuan Lan et al.

CVPR 2025posterarXiv:2412.00833
17
citations

AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding

Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.

NeurIPS 2025posterarXiv:2502.01341

Amplifying Prominent Representations in Multimodal Learning via Variational Dirichlet Process

Tsai Hor Chan, Feng Wu, Yihang Chen et al.

NeurIPS 2025posterarXiv:2510.20736

Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation

xin zhang, Ziruo Zhang, JIAWEI DU et al.

NeurIPS 2025posterarXiv:2505.14705
3
citations

Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning

Tianjiao Jiang, Zhen Zhang, Yuhang Liu et al.

ICCV 2025posterarXiv:2508.03102
1
citations

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding

Henry Zheng, Hao Shi, Qihang Peng et al.

ICLR 2025posterarXiv:2505.04965
8
citations

Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval

Yue Wu, Zhaobo Qi, Yiling Wu et al.

ICLR 2025poster
7
citations

Learning Source-Free Domain Adaptation for Visible-Infrared Person Re-Identification

Yongxiang Li, Yanglin Feng, Yuan Sun et al.

NeurIPS 2025poster

Mitigate the Gap: Improving Cross-Modal Alignment in CLIP

Sedigheh Eslami, Gerard de Melo

ICLR 2025poster
14
citations

Robust Cross-modal Alignment Learning for Cross-Scene Spatial Reasoning and Grounding

Yanglin Feng, Hongyuan Zhu, Dezhong Peng et al.

NeurIPS 2025poster

Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers

Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.

NeurIPS 2025poster
6
citations

Semi-Supervised CLIP Adaptation by Enforcing Semantic and Trapezoidal Consistency

Kai Gan, Bo Ye, Min-Ling Zhang et al.

ICLR 2025poster
3
citations

SGAR: Structural Generative Augmentation for 3D Human Motion Retrieval

Jiahang Zhang, Lilang Lin, Shuai Yang et al.

NeurIPS 2025poster

Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models

Jie ZHANG, Xiaosong Ma, Song Guo et al.

ICML 2024poster

Audio-visual Generalized Zero-shot Learning the Easy Way

Shentong Mo, Pedro Morgado

ECCV 2024posterarXiv:2407.13095
7
citations

Integration of Global and Local Representations for Fine-grained Cross-modal Alignment

Seungwan Jin, Hoyoung Choi, Taehyung Noh et al.

ECCV 2024poster
1
citations

Learning to Localize Actions in Instructional Videos with LLM-Based Multi-Pathway Text-Video Alignment

Yuxiao Chen, Kai Li, Wentao Bao et al.

ECCV 2024posterarXiv:2409.16145
5
citations

Position: The Platonic Representation Hypothesis

Minyoung Huh, Brian Cheung, Tongzhou Wang et al.

ICML 2024poster

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Yuanpeng Tu, Boshen Zhang, Liang Liu et al.

ECCV 2024posterarXiv:2401.03145
24
citations

Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Jinhao Li, Haopeng Li, Sarah Erfani et al.

ICML 2024poster