ICLR Poster "large language model evaluation" Papers
3 papers found
BenTo: Benchmark Reduction with In-Context Transferability
Hongyu Zhao, Ming Li, Lichao Sun et al.
ICLR 2025poster
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows"
Yifei Ming, Senthil Purushwalkam, Shrey Pandit et al.
ICLR 2025poster
45
citations
MMQA: Evaluating LLMs with Multi-Table Multi-Hop Complex Questions
Jian Wu, Linyi Yang, Dongyuan Li et al.
ICLR 2025poster
23
citations