2024 Paper "multimodal large language models" Papers
4 papers found
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
AAAI 2024paperarXiv:2308.09936
190
citations
InstructDoc: A Dataset for Zero
Shot Generalization of Visual Document Understanding with Instructions - Ryota Tanaka, Taichi Iki, Kyosuke Nishida et al.
AAAI 2024paperarXiv:2401.13313
PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology
Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.
AAAI 2024paperarXiv:2305.15072
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
AAAI 2024paperarXiv:2308.12714
84
citations