Poster by Sophia Han Papers
2 papers found
Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models
Sophia Han, Howard Dai, Stephen Xia et al.
NeurIPS 2025posterarXiv:2505.10844
1
citations
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
NeurIPS 2025posterarXiv:2511.04703
8
citations