Artificial neural networks through the lens of dynamical systems theory
Danovski, Kaloyan (Supervisors: Lucas Lacasa, Miguel C. Soriano)
Master Thesis (2023)
The process of training an artificial neural network involves iteratively adapting its weight parameters so as to minimize the error of the network’s prediction, when confronted with a learning task. This iterative change can be naturally interpreted as a trajectory in network space, and thus the training algorithm (e.g. Gradient Descent optimization of a suitable loss function) can be interpreted as a dynamical system in graph space, and the whole training can be characterized by a time series of networks. In this work, we study the dynamical properties of this system and focus on its dynamical stability. We do this mostly by studying how the distance between initially close neural networks evolves over time during training, in a form of “orbital stability”. We find that such distance evolution qualitatively depends on the specific learning rate of the gradient descent scheme, and study a few of the different regions, finding hints of regular and irregular—possibly chaotic—behavior. Our findings are put in contrast to common wisdom on convergence properties of neural networks and dynamical systems theory.