"multi-domain evaluation" Papers
3 papers found
ClinBench: A Standardized Multi-Domain Framework for Evaluating Large Language Models in Clinical Information Extraction
Ismael Villanueva Miranda, Zifan Gu, Donghan Yang et al.
NeurIPS 2025poster
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.
ICLR 2025posterarXiv:2410.06215
8
citations
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
NeurIPS 2025arXiv:2505.16986