by Kristina Nikolić Papers
2 papers found
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
NeurIPS 2025posterarXiv:2505.12575
10
citations
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Kristina Nikolić, Luze Sun, Jie Zhang et al.
ICML 2025spotlight