SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

7citations
Project
7
Citations
#420
in NeurIPS 2025
of 5858 papers
7
Authors
1
Data Points

Abstract

Vision-language-action models (VLAs) show potential as generalist robot policies. However, these models pose extreme safety challenges during real-world deployment, including the risk of harm to the environment, the robot itself, and humans.How can safety constraints be explicitly integrated into VLAs?We address this by exploring an integrated safety approach (ISA), systematicallymodelingsafety requirements, then activelyelicitingdiverse unsafe behaviors, effectivelyconstrainingVLA policies via safe reinforcement learning, and rigorouslyassuringtheir safety through targeted evaluations. Leveraging the constrained Markov decision process (CMDP) paradigm, ISA optimizes VLAs from a min-max perspective against elicited safety risks. Thus, policies aligned through this comprehensive approach achieve the following key features: (I) effectivesafety-performance trade-offs, reducing the cumulative cost of safety violations by 83.58\% compared to the state-of-the-art method, while also maintaining task success rate (+3.85\%). (II) strongsafety assurance, with the ability to mitigate long-tail risks and handle extreme failure scenarios. (III) robustgeneralizationof learned safety behaviors to various out-of-distribution perturbations. The effectiveness is evaluated on long-horizon mobile manipulation tasks.

Citation History

Jan 25, 2026
7