"automatic benchmark generation" Papers
2 篇论文
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
ICLR 2025posterarXiv:2410.10783
11
citations
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time
Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin
AAAI 2024paperarXiv:2312.12343
53
citations