Provable Gradient Editing of Deep Neural Networks

1citations

Citations

#940

in NeurIPS 2025

of 5858 papers

Authors

Data Points

Authors

Zhe Tao Aditya V Thakur

Topics

gradient constraints neural network editing explainable ai safety-critical control linear programming optimization gradient-based interpretation provable model editing

Abstract

In explainable AI, DNN gradients are used to interpret the prediction; in safety-critical control systems, gradients could encode safety constraints; in scientific-computing applications, gradients could encode physical invariants. While recent work on provable editing of DNNs has focused on input-output constraints, the problem of enforcing hard constraints on DNN gradients remains unaddressed. We present ProGrad, the first efficient approach for editing the parameters of a DNN to provably enforce hard constraints on the DNN gradients. Given a DNN $\mathcal{N}$ with parameters $\theta$, and a set $\mathcal{S}$ of pairs $(\mathrm{x}, \mathrm{Q})$ of input $\mathrm{x}$ and corresponding linear gradient constraints $\mathrm{Q}$, ProGrad finds new parameters $\theta'$ such that $\bigwedge_{(\mathrm{x}, \mathrm{Q}) \in \mathcal{S}} \frac{\partial}{\partial \mathrm{x}}\mathcal{N}(\mathrm{x}; \theta') \in \mathrm{Q}$ while minimizing the changes $\lVert\theta' - \theta\rVert$. The key contribution is a novel *conditional variable gradient* of DNNs, which relaxes the NP-hard provable gradient editing problem to a linear program (LP), enabling ProGrad to use an LP solver to efficiently and effectively enforce the gradient constraints. We experimentally evaluated ProGrad via enforcing (i) hard Grad-CAM constraints on ImageNet ResNet DNNs; (ii) hard Integrated Gradients constraints on Llama 3 and Qwen 3 LLMs; (iii) hard gradient constraints in training a DNN to approximate a target function as a proxy for safety constraints in control systems and physical invariants in scientific applications. The results highlight the unique capability of ProGrad in enforcing hard constraints on DNN gradients.

Citation History

Jan 25, 2026

Jan 31, 2026