"multimodal large language models" Papers
216 papers found • Page 5 of 5
Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning
Zhuo Huang, Chang Liu, Yinpeng Dong et al.
ICML 2024posterarXiv:2312.02546
MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen, Ruoxi Chen, Shilin Zhang et al.
ICML 2024posterarXiv:2402.04788
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
ECCV 2024posterarXiv:2311.17600
183
citations
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu, Hao Fei, Leigang Qu et al.
ICML 2024posterarXiv:2309.05519
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li, Junfeng Wu, Weizhi Zhao et al.
ECCV 2024posterarXiv:2407.16696
13
citations
PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology
Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.
AAAI 2024paperarXiv:2305.15072
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.
ECCV 2024posterarXiv:2408.02231
10
citations
RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models
Qi Lv, Hao Li, Xiang Deng et al.
ICML 2024posterarXiv:2404.04929
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li, Jingbo Wang, Lixin Yang et al.
ECCV 2024posterarXiv:2404.03590
34
citations
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.
ECCV 2024posterarXiv:2402.19474
86
citations
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng, Bohan Zhou, Yicheng Feng et al.
ECCV 2024posterarXiv:2403.09072
14
citations
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin, Zhicheng Sun, Kun Xu et al.
ICML 2024oralarXiv:2402.03161
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei, Shengqiong Wu, Wei Ji et al.
ICML 2024oralarXiv:2501.03230
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
AAAI 2024paperarXiv:2308.12714
84
citations
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
ICML 2024spotlightarXiv:2402.05930
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao et al.
ECCV 2024posterarXiv:2403.13043
70
citations