"large multimodal models" Papers
19 papers found
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Zhibo Yang, Jun Tang, Zhaohai Li et al.
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang, Chufan Shi, Yaxin Liu et al.
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin, Shizhao Sun, Danqing Huang et al.
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Yang et al.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei, Zheyang Xiong, Weiming Ren et al.
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang et al.
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng, Boyu Gou, Jihyung Kil et al.
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu, Zhengyuan Yang, Linjie Li et al.
NExT-Chat: An LMM for Chat, Detection and Segmentation
Ao Zhang, Yuan Yao, Wei Ji et al.
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
PSALM: Pixelwise Segmentation with Large Multi-modal Model
Zheng Zhang, YeYao Ma, Enming Zhang et al.
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context
yunxin li, Baotian Hu, Haoyuan Shi et al.