2025 "llm-assisted evaluation" Papers
2 papers found
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
ICLR 2025oralarXiv:2410.03051
102
citations
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models
Narun Raman, Taylor Lundy, Thiago Amin et al.
NEURIPS 2025posterarXiv:2502.13119
3
citations