Recent Progress in the Theory of Neural Networks
It's common wisdom that neural networks are basically "matrix multiplications that nobody understands" , impenetrable to theoretical analysis, which have achieved great results largely through trial-and-error. While this may have been true in the past, recently there has been significant progress towards developing a theoretical understanding of neural networks. Most notably, we have obtained an arguably complete understanding of network initialization and training dynamics in a certain infinite-width limit. There has also been some progress towards understanding their generalization behavior. In this post I will review some of this recent progress and discuss the potential relevance to AI alignment. Infinite Width Nets: Initialization The most exciting recent developments in the theory of neural networks have focused the infinite-width limit. We consider neural networks where the number of neurons in all hidden layers are increased to infinity. Typically we consider networks with a Gaussian-initialized weights, and scale the variance at initialization as 1√H, where H is the number of hidden units in the preceding layer(this is needed to avoid inputs blowing up, and is also the initialization scheme usually used in real networks). In this limit, we have obtained an essentially complete understanding of both behavior at initialization and training dynamics[1]. (Those with limited interest/knowledge of math may wish to "Significance and Limitations" below). We've actually had a pretty good understanding of the behavior of infinite-width neural networks at initialization for a while, since the work of Radford Neal(1994). He proved that in this limit, fully-connected neural networks with Gaussian-distributed weights and biases limit to what are known as Gaussian processes. Gaussian processes can be thought of the generalization of Gaussian distributions from finite-dimensional spaces to spaces of functions. Neal's paper provides a very clear derivation of this behavi
I noticed that Opus4.6 tried to get creative and actually write a fantasy-ish story whereas GPT5.2 mostly just elaborated on your notes, making its task a bit easier. So I tried prompting GPT5.2 identically but with "write a fantasy story in the style of Terry Pratchett" prepended and got this:
GPT5.2
The Refrigerator That Went the Wrong Way
Everyone knows that time is a river¹, except in certain parts of the city where it is more of a municipal plumbing problem.
The first indication that something had gone wrong with the fridge was the soup.
Tarin Gloss, junior time-magics student, senior refrigerator tinkerer, and unpaid intern at ChronoCool Solutions (Ltd., Pending, Hopefully), had placed a bowl of