Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

14citations
arXiv:2406.14230
14
citations
#406
in ICML 2025
of 3340 papers
6
Top Authors
4
Data Points

Abstract

Warning: Contains harmful model outputs.Despite significant advancements, the propensity of Large Language Models (LLMs) to generate harmful and unethical content poses critical challenges.Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Although numerous benchmarks have been constructed to assess social bias, toxicity, and ethical issues in LLMs, those static benchmarks suffer fromevaluation chronoeffect, in which, as models rapidly evolve, existing benchmarks may leak into training data or become saturated,overestimatingever-developing LLMs. To tackle this problem, we propose GETA, a novelgenerative evolving testingapproach based on adaptive testing methods in measurement theory. Unlike traditional adaptive testing methods that rely on a static test item pool, GETA probes the underlying moral boundaries of LLMs by dynamically generating test items tailored to model capability. GETA co-evolves with LLMs by learning a joint distribution of item difficulty and model value conformity, thus effectively addressing evaluation chronoeffect. We evaluated various popular LLMs with GETA and demonstrated that 1) GETA can dynamically create difficulty-tailored test items and 2) GETA's evaluation results are more consistent with models' performance on unseen OOD and i.i.d. items, laying the groundwork for future evaluation paradigms.

Citation History

Jan 28, 2026
0
Feb 13, 2026
14+14
Feb 13, 2026
14
Feb 13, 2026
14