Topics
Abstract
Echoing recent calls to counter reliability and robustness concerns in machine learning viamultiverse analysis, we present PRESTO, a principled framework formapping the multiverseof machine-learning models that rely onlatent representations. Although such models enjoy widespread adoption, the variability in their embeddings remains poorly understood, resulting in unnecessary complexity and untrustworthy representations. Our framework usespersistent homologyto characterize the latent spaces arising from different combinations of diverse machine-learning methods, (hyper)parameter configurations, and datasets, allowing us to measure their pairwise(dis)similarityand statistically reason about theirdistributions. As we demonstrate both theoretically and empirically, our pipeline preserves desirable properties of collections of latent representations, and it can be leveraged to perform sensitivity analysis, detect anomalous embeddings, or efficiently and effectively navigate hyperparameter search spaces.