MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

41citations

Project

Citations

#205

in ICLR 2025

of 3827 papers

Authors

Data Points

Authors

Yusu Qian Hanrong Ye Jean-Philippe Fauconnier Peter Grasch Yinfei Yang Zhe Gan

Abstract

Effective evaluation of Multimodal Large Language Models (MLLMs) is essential for understanding their capabilities and limitations. In this paper, we introduce MIA-Bench, a benchmark designed to assess MLLMs’ ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models’ compliance with layered instructions in generating accurate and contextually appropriate responses. Evaluation results from a wide array of state-of-the-art MLLMs reveal significant variations in performance, highlighting areas for improvement in instruction fidelity. Additionally, we create extra training data and explore supervised fine-tuning and direct preference optimization to enhance the models’ ability to strictly follow instructions without compromising performance on other tasks. We hope this benchmark not only serves as a tool for measuring MLLM adherence to instructions, but also guides future developments in MLLM training methods.

Citation History

Jan 26, 2026

Jan 27, 2026

41+41