Bigger is not always better. While large language models and complex deep neural networks (DNNs) have resulted in huge performance gains across a variety of AI-related tasks, lighter and simpler models are often preferable in industrial applications. It is thus crucial for continued efficient DNN development and deployment that the machine learning research community improves its understanding of fundamental model complexity control methods such as regularization.
In the new paper Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity , a research team from Google and DeepMind proposes Geometric Complexity (GC), a measure of DNN model complexity that serves as a useful tool for understanding the underlying mechanisms of complexity control.
The team summarizes their main contributions as follows:
- We develop a computationally tractable notion of complexity, which we call Geometric Complexity (GC), that has many close relationships with many areas in deep learning and mathematics, including harmonic function theory, Lipschitz smoothness, and regularization theory.
- We provide evidence that common training heuristics keep the geometric complexity low, including (i) common initialization schemes, (ii) the use of overparametrized models with a large number of layers, (iii) large learning rates, small batch sizes, and implicit gradient regularization, (iv) explicit parameter norm regularization, spectral norm regularization, flatness regularization, and label noise regularization.
- We show that the geometric complexity captures the double-descent behaviour observed in the test loss as model parameter count increases.
Previous studies have proposed numerous complexity measures — naive parameter count, data-driven approaches, VC dimension and Rademacher complexity — but most of these fail to clarify the properties of regularizers and the connections between implicit and explicit regularizers.
The proposed GC aims at solving these issues. The researchers provide a clear definition of GC and how it relates to important aspects of deep learning, including linear models, ReLU networks, Lipschitz smoothness, arc length, and harmonic maps.
The paper explores the impacts of initialization, explicit regularization and implicit regularization on geometric complexity; examining different parameter initialization choices, L2 regularization, Lipschitz regularization via spectral norm regularization, noise regularization, flatness regularization, explicit GC regularization and Jacobian regularization. The researchers conclude that common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization can all play a part in reducing geometric complexity .
Finally, the team demonstrates that GC can also capture double-descent behaviour in the test loss when a model’s parameter count increases.
Overall, this paper validates GC as an effective tool for understanding DNN models and sheds light on how DNNs achieve low test errors with highly expressive models. The team hopes their work will encourage further research in this area and lead to a better understanding of current best practices and the discovery of new methods for efficient model training.
The paper Why Neural Networks Find Simple Solutions: The Many Regularizers of Geometric Complexity has been accepted by the 36th Conference on Neural Information Processing Systems ( NeurIPS 2022 ), which runs from November 28 to December 9 in New Orleans, USA; and is available on arXiv .
Author : Hecate He | Editor : Michael Sarazen
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.