Poster "multimodal benchmarks" Papers
4 papers found
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, JIAAN LUO et al.
NeurIPS 2025posterarXiv:2503.22215
3
citations
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
ICCV 2025posterarXiv:2507.00505
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.
ECCV 2024posterarXiv:2404.01291
347
citations
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
ECCV 2024posterarXiv:2312.02949
114
citations