A self-consistent GP framework far from the GP limitGadi Naveh, Hebrew University of Jerusalem 12:00 ET
Recently, the infinite-width limit of deep neural networks (DNNs) has garnered much attention, since it provides a clear theoretical window into deep learning via mappings to Gaussian processes (GPs). In spite of its theoretical appeal, this perspective lacks a key component of finite DNNs, that is at the core of their success - feature learning. Here we consider DNNs trained with noisy gradient descent on a large training set and derive a self-consistent Gaussian Process theory accounting for strong finite-DNN and feature learning effects. We apply this theory to two toy models and find excellent agreement with experiments. We further identify, both analytically and numerically, a sharp transition between a feature learning regime and a lazy learning regime in one of these models. We have numerical evidence demonstrating that the assumptions required for our theory hold true in more realistic settings (Myrtle5 CNN trained on CIFAR-10).
Link to paper.