Physics ∩ ML

a virtual hub at the interface of theoretical physics and deep learning.

27 Jan 2021

The Importance of Being Interpretable

Michelle Ntampaka, Space Telescope Science Institute, 12:00 EDT

Abstract: Cosmology is entering an era of data-driven science. Modern machine learning techniques are being combined with large astronomical surveys to enable powerful new research methods. This shift in our scientific approach requires us to ask an important question: Can we trust the black box?

I will present a deep machine learning approach to constraining cosmological parameters with multi-wavelength observations of galaxy clusters. The ML approach has two components: an autoencoder that builds a compressed representation of each galaxy cluster and a flexible CNN to estimate the cosmological model from a cluster sample. From mock observations, the ML method estimates the amplitude of matter fluctuations, sigma8, at approximately the expected theoretical limit. More importantly, the deep ML approach can be understood and interpreted. I will lay out three schemes for interpreting the ML technique: a leave-one-out method for assessing cluster importance, an average saliency for evaluating feature importance, and correlations in the terse layer for understanding whether an ML technique can be safely applied to observational data. I will introduce the term “overspecialized" to describe a common pitfall in astronomical applications of machine learning in which the ML method learns simulation-specific details, and we show how a carefully sculpted architecture can be used to check for this source of systematic error.