2024 "vision-language datasets" Papers
2 papers found
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.
ECCV 2024posterarXiv:2404.19753
98
citations
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
ECCV 2024posterarXiv:2404.01197