2025 Poster "llm benchmarking" Papers
3 papers found
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han, seung-won hwang, Rajhans Samdani et al.
ICLR 2025posterarXiv:2502.19852
12
citations
DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang, Siyuan Wu, Chujie Gao et al.
ICLR 2025posterarXiv:2406.18966
21
citations
Risk Management for Mitigating Benchmark Failure Modes: BenchRisk
Sean McGregor, Vassil Tashev, Armstrong Foundjem et al.
NEURIPS 2025posterarXiv:2510.21460
1
citations