2024 "large multimodal models" Papers
10 papers found
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang et al.
ICML 2024posterarXiv:2401.13311
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng, Boyu Gou, Jihyung Kil et al.
ICML 2024posterarXiv:2401.01614
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
ECCV 2024posterarXiv:2312.02949
114
citations
M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
Mingsheng Li, Xin Chen, Chi Zhang et al.
ECCV 2024poster
4
citations
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu, Zhengyuan Yang, Linjie Li et al.
ICML 2024posterarXiv:2308.02490
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang, Yuan Yao, Wei Ji et al.
ICML 2024posterarXiv:2311.04498
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
ECCV 2024posterarXiv:2407.11213
13
citations
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
YUXUAN SUN, Hao Wu, Chenglu Zhu et al.
ECCV 2024posterarXiv:2401.16355
36
citations
PSALM: Pixelwise Segmentation with Large Multi-modal Model
Zheng Zhang, YeYao Ma, Enming Zhang et al.
ECCV 2024posterarXiv:2403.14598
82
citations
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.
ICML 2024posterarXiv:2405.04950