ICLR "multimodal large language models" Papers
10 papers found
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Guo Chen, Yicheng Liu, Yifei Huang et al.
ICLR 2025posterarXiv:2412.12075
41
citations
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
ICLR 2025posterarXiv:2408.15998
116
citations
Grounding Multimodal Large Language Model in GUI World
Weixian Lei, Difei Gao, Mike Zheng Shou
ICLR 2025poster
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu, Tianyue Ou, Yifan Song et al.
ICLR 2025posterarXiv:2410.13824
21
citations
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Lehan Wang, Haonan Wang, Honglong Yang et al.
ICLR 2025posterarXiv:2410.18387
17
citations
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
ICLR 2025poster
20
citations
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang et al.
ICLR 2025posterarXiv:2406.17806
20
citations
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.
ICLR 2025posterarXiv:2411.02571
78
citations
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.
ICLR 2025posterarXiv:2410.04223
19
citations
ScImage: How good are multimodal large language models at scientific text-to-image generation?
Leixin Zhang, Steffen Eger, Yinjie Cheng et al.
ICLR 2025posterarXiv:2412.02368
4
citations