"vision-language datasets" Papers
3 papers found
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
NeurIPS 2025spotlightarXiv:2505.21375
11
citations
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang et al.
CVPR 2025posterarXiv:2310.14356
5
citations
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
ECCV 2024posterarXiv:2404.01197