"vision-language tasks" Papers

10 papers found

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

Tatiana Zemskova, Dmitry Yudin

ICCV 2025posterarXiv:2412.18450
11
citations

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha et al.

NeurIPS 2025posterarXiv:2502.19335
4
citations

MIEB: Massive Image Embedding Benchmark

Chenghao Xiao, Isaac Chung, Imene Kerboua et al.

ICCV 2025posterarXiv:2504.10471
6
citations

Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer

Ningyuan Zhang, Jie Lu, Keqiuyin Li et al.

ICLR 2025poster
1
citations

UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface

Hao Tang, Chen-Wei Xie, Haiyang Wang et al.

NeurIPS 2025spotlightarXiv:2503.01342
14
citations

Cycle-Consistency Learning for Captioning and Grounding

Ning Wang, Jiajun Deng, Mingbo Jia

AAAI 2024paperarXiv:2312.15162
13
citations

Differentially Private Representation Learning via Image Captioning

Tom Sander, Yaodong Yu, Maziar Sanjabi et al.

ICML 2024poster

FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning

Haokun Chen, Yao Zhang, Denis Krompass et al.

AAAI 2024paperarXiv:2308.12305
86
citations

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

Dongyang Liu, Renrui Zhang, Longtian Qiu et al.

ICML 2024poster

VIGC: Visual Instruction Generation and Correction

Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński

AAAI 2024paperarXiv:2308.12714
84
citations