How do we interpret the outputs of a neural network trained on classification?

0citations

Project

Citations

#1817

in ICLR 2025

of 3827 papers

Authors

Data Points

Authors

Yudi Xie

Abstract

Deep neural networks are widely used for classification tasks, but the interpretation of their output activations is often unclear. This post explains how these outputs can be understood as approximations of the Bayesian posterior probability. We showed that, in theory, the loss function for classification tasks -- derived by maximum likelihood -- is minimized by the Bayesian posterior. We conducted empirical studies training neural networks to classify synthetic data from a known generative model. In a simple classification task, the network closely approximates the theoretically derived posterior. However, simple changes in the task can make accurate approximation much more difficult. The model's ability to approximate the posterior depends on multiple factors, such as the complexity of the posterior and whether there is sufficient data for learning.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 28, 2026