"multimodal systems" Papers
2 papers found
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
ICCV 2025highlightarXiv:2502.06788
18
citations
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining
Minjun Kim, SeungWoo Song, Youhan Lee et al.
AAAI 2024paperarXiv:2401.06443
9
citations