"instruction following evaluation" Papers
2 papers found
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo, Minh Chien Vu, Jenny Chim et al.
ICLR 2025posterarXiv:2406.15877
397
citations
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.
ICLR 2025posterarXiv:2407.01509
41
citations