The Gaussian world is not enough - how data shapes neural network representations
Sebastian Goldt, SISSAAbstract:
Neural networks are powerful feature extractors - but which features do they extract from their data? And how does the structure of the training data shape the representations they learn? We discuss three questions in this direction. First, we present analytical and experimental evidence for a “distributional simplicity bias”, whereby neural networks learn increasingly complex distributions of their inputs during training, going from a simple perceptron up to deep ResNets. We then show that neural networks can learn information in their higher-order cumulants more efficiently than lazy methods. We finally develop a simple model for images and show that a neural network trained on these images learn a convolution from scratch by exploiting the structure of the higher-order cumulants of the “images”.