ICLR "large multimodal models" Papers
11 papers found
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang, Haoning Wu, Chunyi Li et al.
ICLR 2025posterarXiv:2406.03070
40
citations
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang, Chufan Shi, Yaxin Liu et al.
ICLR 2025posterarXiv:2406.09961
65
citations
Does Spatial Cognition Emerge in Frontier Models?
Santhosh Kumar Ramakrishnan, Erik Wijmans, Philipp Krähenbühl et al.
ICLR 2025posterarXiv:2410.06468
50
citations
Fine-Tuning Token-Based Large Multimodal Models: What Works, What Doesn’t and What's Next
Zhulin Hu, Yan Ma, Jiadi Su et al.
ICLR 2025poster
GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models
Zewei Zhang, Huan Liu, Jun Chen et al.
ICLR 2025posterarXiv:2404.07206
25
citations
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
ICLR 2025posterarXiv:2407.17773
18
citations
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang, Qingkai Fang, Yang et al.
ICLR 2025posterarXiv:2501.03895
106
citations
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Junyan Ye, Baichuan Zhou, Zilong Huang et al.
ICLR 2025posterarXiv:2410.09732
28
citations
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
ICLR 2025poster
6
citations
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei, Zheyang Xiong, Weiming Ren et al.
ICLR 2025posterarXiv:2411.07199
88
citations
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu, James Liang, Ruixiang Tang et al.
ICLR 2025posterarXiv:2503.00723