Theoretical Physics could do with borrowing some rough ideals from AlexNet and Sutton's Bitter Lesson

August 2024

Sub-title: Moving Beyond "Beauty", "Elegance" and suchlike Abstractional Biases

Table of Contents

The obsession with elegance
Cognitive irreducibility
The effectiveness of Deep Learning
The Bitter Lesson

Sabine Hossenfelder is perhaps one of the most interesting thinkers I know of. Her consistency and regularity on YouTube is particularly admirable; every week there are two new videos that are in no way just reaction videos or "here's what happened, and here's what I think about it" - they are all well-researched and are built up around a strong core thesis.

One of her main ideas is that post-relativity theoretical physics has bet much (perhaps too much) on Symmetry, Elegance, and suchlike handwavy, loosely-definable abstractions, that all boil down to elegant patterns.

"What is especially striking and remarkable is that in fundamental physics a beautiful or elegant theory is more likely to be right than a theory that is inelegant."
— Murray Gell-Mann

This isn't a very disagreeable statement, tracing a very basic Occam's razor-like argument where simplicity is a given and everything else emerges from basic principles isn't completely invalid. Of course, results like Noether's theorem or Least Action Principle enforce this very lossy heuristic in people.

Hossenfelder argues that physicists think the foundations of physics are not pretty enough, they invent prettier theories and are then surprised if no evidence is found to support them. They are largely unaware that this is what they are doing because requirements of 'beauty' have become mathematical standards.

This is, prima facie, true: Supersymmetry, string theory, gravitons, the fifth fundamental force, etc. are all such attempts conforming to this comment. There are obvious instances of observer/anthropic bias playing here: even such small declarations are fairly liberal assumptions (assuming hierarchies, conditional ops etc.).

Just as there is computational irreducibility (i.e., phenomena being too convoluted and pattern-less to be computed), there is obvious "cognitive irreducibility" (i.e., phenomena too convoluted for a human to model them well enough).

"Just as there are odors that dogs can smell and we cannot, as well as sounds that dogs can hear and we cannot, so too there are wavelengths of light we cannot see and flavors we cannot taste. Why then, given our brains wired the way they are, does the remark 'Perhaps there are thoughts we cannot think' surprise you?"
– Richard Hamming

Such a counter-productive obsession with elegance and beauty existed in AI for a long time (If you ask me to point to when it completely paused, it's perhaps AlexNet in 2012).

I want to go back and make a very short and basic case for why deep learning is so unreasonably effective in the first place: Many high-dimensional data sets that occur in the real world actually lie along low-dimensional latent manifolds inside that high-dimensional space.

Neural nets have enough combinatorial expressivity and a great learning optimizer (backprop + SGD) that allows to take in massively wide datasets, and to weightedly best attribute causal phenomena to a few "latent manifolds". This is exactly what we want: to cut through the noise, and find tractable patterns that best model phenomena.

The core lesson here was simple: It's the 2nd part of Sutton's Bitter Lesson.

"The general point to be learned from the bitter lesson is that the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds... instead we should build in only the meta-methods that can find and capture this arbitrary complexity."
— Rich Sutton

Sutton makes two broad points: 1) Do not wishfully bank on symmetry and simplicity to save you. 2) Keep your longings and biases away. AlexNet did exactly this, when almost everyone was going down the exact pitfalls Sutton warns against.

Coming back to physics, AlphaFold best captures this thesis: it understood molecular dynamics not with baked-in biases, but by the sheer power of deep learning. I am particularly interested in using neural nets to predict physical behavior and later symbolicizing the principles the neural net has learned.