How do we interpret the outputs of a neural network trained on classification?

0citations
Project
0
Citations
#1817
in ICLR 2025
of 3827 papers
1
Authors
4
Data Points

Authors

Abstract

Deep neural networks are widely used for classification tasks, but the interpretation of their output activations is often unclear. This post explains how these outputs can be understood as approximations of the Bayesian posterior probability. We showed that, in theory, the loss function for classification tasks -- derived by maximum likelihood -- is minimized by the Bayesian posterior. We conducted empirical studies training neural networks to classify synthetic data from a known generative model. In a simple classification task, the network closely approximates the theoretically derived posterior. However, simple changes in the task can make accurate approximation much more difficult. The model's ability to approximate the posterior depends on multiple factors, such as the complexity of the posterior and whether there is sufficient data for learning.

Citation History

Jan 25, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Jan 28, 2026
0