"multimodal instruction tuning" Papers
3 papers found
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
ICCV 2025highlightarXiv:2507.07424
7
citations
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu, Tianyue Ou, Yifan Song et al.
ICLR 2025posterarXiv:2410.13824
21
citations
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Kai Liu, Jungang Li, Yuchong Sun et al.
NeurIPS 2025oralarXiv:2512.22905
4
citations