2025 "large multimodal models" Papers
22 papers found
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Orr Zohar, Xiaohan Wang, Yann Dubois et al.
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Zhibo Yang, Jun Tang, Zhaohai Li et al.
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang, Chufan Shi, Yaxin Liu et al.
ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
Benedetta Liberatori, Alessandro Conti, Lorenzo Vaquero et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
Does Spatial Cognition Emerge in Frontier Models?
Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
Ming Li, Jike Zhong, Tianle Chen et al.
Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu et al.
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin, Shizhao Sun, Danqing Huang et al.
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
Zewei Zhang, Huan Liu, Jun Chen et al.
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Yang et al.
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Junyan Ye, Baichuan Zhou, Zilong Huang et al.
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
MS-Bench: Evaluating LMMs in Ancient Manuscript Study through a Dunhuang Case Study
Yuqing Zhang, Yue Han, Shuanghe Zhu et al.
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei, Zheyang Xiong, Weiming Ren et al.
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu, James Liang, Ruixiang Tang et al.
Seeing the Arrow of Time in Large Multimodal Models
Zihui (Sherry) Xue, Romy Luo, Kristen Grauman
SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model
Zhenglin Huang, Jinwei Hu, Yiwei He et al.
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li et al.