2025 "large multimodal models" Papers

22 papers found

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Orr Zohar, Xiaohan Wang, Yann Dubois et al.

CVPR 2025posterarXiv:2412.10360
55
citations

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Zhibo Yang, Jun Tang, Zhaohai Li et al.

ICCV 2025posterarXiv:2412.02210
42
citations

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Cheng Yang, Chufan Shi, Yaxin Liu et al.

ICLR 2025posterarXiv:2406.09961
65
citations

ConViS-Bench: Estimating Video Similarity Through Semantic Concepts

Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero et al.

NeurIPS 2025posterarXiv:2509.19245
1
citations

CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology

Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.

CVPR 2025posterarXiv:2412.12077
22
citations

Does Spatial Cognition Emerge in Frontier Models?

Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.

ICLR 2025posterarXiv:2410.06468
50
citations

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark

Ming Li, Jike Zhong, Tianle Chen et al.

CVPR 2025posterarXiv:2411.01492
6
citations

Federated Continual Instruction Tuning

Haiyang Guo, Fanhu Zeng, Fei Zhu et al.

ICCV 2025posterarXiv:2503.12897
6
citations

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025posterarXiv:2406.05821
21
citations

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Jiawei Lin, Shizhao Sun, Danqing Huang et al.

CVPR 2025posterarXiv:2412.19712
5
citations

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025posterarXiv:2404.07206
25
citations

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu, Maan Qraitem, Anisa Majhi et al.

ICLR 2025posterarXiv:2407.17773
18
citations

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Shaolei Zhang, Qingkai Fang, Yang et al.

ICLR 2025posterarXiv:2501.03895
106
citations

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025posterarXiv:2410.09732
28
citations

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

ICLR 2025poster
6
citations

MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study

Yuqing Zhang, Yue Han, Shuanghe Zhu et al.

NeurIPS 2025poster

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Cong Wei, Zheyang Xiong, Weiming Ren et al.

ICLR 2025posterarXiv:2411.07199
88
citations

On Large Multimodal Models as Open-World Image Classifiers

Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.

ICCV 2025posterarXiv:2503.21851
3
citations

Re-Imagining Multimodal Instruction Tuning: A Representation View

Yiyang Liu, James Liang, Ruixiang Tang et al.

ICLR 2025posterarXiv:2503.00723

Seeing the Arrow of Time in Large Multimodal Models

Zihui (Sherry) Xue, Romy Luo, Kristen Grauman

NeurIPS 2025oralarXiv:2506.03340
5
citations

SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model

Zhenglin Huang, Jinwei Hu, Yiwei He et al.

CVPR 2025posterarXiv:2412.04292
64
citations

VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation

Ziyang Luo, Haoning Wu, Dongxu Li et al.

CVPR 2025posterarXiv:2411.13281
14
citations