ICLR 2025 "large multimodal models" Papers

11 papers found

A-Bench: Are LMMs Masters at Evaluating AI-generated Images?

Zicheng Zhang, Haoning Wu, Chunyi Li et al.

ICLR 2025posterarXiv:2406.03070
40
citations

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Cheng Yang, Chufan Shi, Yaxin Liu et al.

ICLR 2025posterarXiv:2406.09961
65
citations

Does Spatial Cognition Emerge in Frontier Models?

Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.

ICLR 2025posterarXiv:2410.06468
50
citations

Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next

Zhulin Hu, Yan Ma, Jiadi Su et al.

ICLR 2025poster

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models

Zewei Zhang, Huan Liu, Jun Chen et al.

ICLR 2025posterarXiv:2404.07206
25
citations

KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models

Eunice Yiu, Maan Qraitem, Anisa Majhi et al.

ICLR 2025posterarXiv:2407.17773
18
citations

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Shaolei Zhang, Qingkai Fang, Yang et al.

ICLR 2025posterarXiv:2501.03895
106
citations

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Junyan Ye, Baichuan Zhou, Zilong Huang et al.

ICLR 2025posterarXiv:2410.09732
28
citations

MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines

Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.

ICLR 2025poster
6
citations

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Cong Wei, Zheyang Xiong, Weiming Ren et al.

ICLR 2025posterarXiv:2411.07199
88
citations

Re-Imagining Multimodal Instruction Tuning: A Representation View

Yiyang Liu, James Liang, Ruixiang Tang et al.

ICLR 2025posterarXiv:2503.00723