"vision-language tasks" Papers
10 papers found
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova, Dmitry Yudin
ICCV 2025posterarXiv:2412.18450
11
citations
Gatekeeper: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha et al.
NeurIPS 2025posterarXiv:2502.19335
4
citations
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao, Isaac Chung, Imene Kerboua et al.
ICCV 2025posterarXiv:2504.10471
6
citations
Release the Powers of Prompt Tuning: Cross-Modality Prompt Transfer
Ningyuan Zhang, Jie Lu, Keqiuyin Li et al.
ICLR 2025poster
1
citations
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang, Chen-Wei Xie, Haiyang Wang et al.
NeurIPS 2025spotlightarXiv:2503.01342
14
citations
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang, Jiajun Deng, Mingbo Jia
AAAI 2024paperarXiv:2312.15162
13
citations
Differentially Private Representation Learning via Image Captioning
Tom Sander, Yaodong Yu, Maziar Sanjabi et al.
ICML 2024poster
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
AAAI 2024paperarXiv:2308.12305
86
citations
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Dongyang Liu, Renrui Zhang, Longtian Qiu et al.
ICML 2024poster
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
AAAI 2024paperarXiv:2308.12714
84
citations