2025 "benchmark generation" Papers
4 papers found
Autonomous Evaluation of LLMs for Truth Maintenance and Reasoning Tasks
Rushang Karia, Daniel Bramblett, Daksh Dobhal et al.
ICLR 2025posterarXiv:2410.08437
2
citations
Physiome-ODE: A Benchmark for Irregularly Sampled Multivariate Time-Series Forecasting Based on Biological ODEs
Christian Klötergens, Vijaya Krishna Yalavarthi, Randolf Scholz et al.
ICLR 2025posterarXiv:2502.07489
2
citations
Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity
Qiyao Wei, Edward R Morrell, Lea Goetz et al.
NeurIPS 2025posterarXiv:2511.19925
Silencer: From Discovery to Mitigation of Self-Bias in LLM-as-Benchmark-Generator
Peiwen Yuan, Yiwei Li, Shaoxiong Feng et al.
NeurIPS 2025posterarXiv:2505.20738
3
citations