Emergent Risk Awareness in Rational Agents under Resource Constraints

2citations

arXiv:2505.23436

Citations

#976

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Daniel Jarne Ornia Nicholas Bishop Joel Dyer Wei-Chen Lee Anisoara Calinescu Doyne Farmer Michael Wooldridge

Topics

survival bandit framework resource constraints sequential decision-making agentic capabilities preference shifts risk-seeking behavior risk-averse behavior agent misalignment

Abstract

Advanced reasoning models with agentic capabilities (AI agents) are deployed to interact with humans and to solve sequential decision-making problems under (approximate) utility functions and internal models. When such problems have resource or failure constraints where action sequences may be forcibly terminated once resources are exhausted, agents face implicit trade-offs that reshape their utility-driven (rational) behaviour. Additionally, since these agents are typically commissioned by a human principal to act on their behalf, asymmetries in constraint exposure can give rise to previously unanticipated misalignment between human objectives and agent incentives. We formalise this setting through a survival bandit framework, provide theoretical and empirical results that quantify the impact of survival-driven preference shifts, identify conditions under which misalignment emerges and propose mechanisms to mitigate the emergence of risk-seeking or risk-averse behaviours. As a result, this work aims to increase understanding and interpretability of emergent behaviours of AI agents operating under such survival pressure, and offer guidelines for safely deploying such AI systems in critical resource-limited environments.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 1, 2026

2+2