"multimodal large language models" Papers

216 papers found • Page 5 of 5

Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning

Zhuo Huang, Chang Liu, Yinpeng Dong et al.

ICML 2024posterarXiv:2312.02546

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Dongping Chen, Ruoxi Chen, Shilin Zhang et al.

ICML 2024posterarXiv:2402.04788

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

Xin Liu, Yichen Zhu, Jindong Gu et al.

ECCV 2024posterarXiv:2311.17600
183
citations

NExT-GPT: Any-to-Any Multimodal LLM

Shengqiong Wu, Hao Fei, Leigang Qu et al.

ICML 2024posterarXiv:2309.05519

PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects

Junyi Li, Junfeng Wu, Weizhi Zhao et al.

ECCV 2024posterarXiv:2407.16696
13
citations

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

Yuxuan Sun, Chenglu Zhu, Sunyi Zheng et al.

AAAI 2024paperarXiv:2305.15072

REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

Agneet Chatterjee, Yiran Luo, Tejas Gokhale et al.

ECCV 2024posterarXiv:2408.02231
10
citations

RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

Qi Lv, Hao Li, Xiang Deng et al.

ICML 2024posterarXiv:2404.04929

SemGrasp: Semantic Grasp Generation via Language Aligned Discretization

Kailin Li, Jingbo Wang, Lixin Yang et al.

ECCV 2024posterarXiv:2404.03590
34
citations

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World

Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.

ECCV 2024posterarXiv:2402.19474
86
citations

UniCode : Learning a Unified Codebook for Multimodal Large Language Models

Sipeng Zheng, Bohan Zhou, Yicheng Feng et al.

ECCV 2024posterarXiv:2403.09072
14
citations

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

Yang Jin, Zhicheng Sun, Kun Xu et al.

ICML 2024oralarXiv:2402.03161

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

Hao Fei, Shengqiong Wu, Wei Ji et al.

ICML 2024oralarXiv:2501.03230

VIGC: Visual Instruction Generation and Correction

Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński

AAAI 2024paperarXiv:2308.12714
84
citations

WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

Xing Han Lù, Zdeněk Kasner, Siva Reddy

ICML 2024spotlightarXiv:2402.05930

When Do We Not Need Larger Vision Models?

Baifeng Shi, Ziyang Wu, Maolin Mao et al.

ECCV 2024posterarXiv:2403.13043
70
citations
← Previous
1...345
Next →