Renormalization Roadmap
At PIBBSS, we’ve been thinking about how renormalization can be developed into a rich framework for AI interpretability. This document serves as a roadmap for this research agenda – which we are calling an Opportunity Space[1] for the AI safety community. In what follows, we explore the technical and philosophical significance of renormalization for physics and AI safety, problem areas in which it could be most useful, and some interesting existing directions – mainly from physics – that we are excited to place in direct contact with AI safety. This roadmap will also provide context for our forthcoming Call for Collaborations, during which we will hire affiliates to work on projects in this area. Acknowledgements: While Lauren did the writing, this opportunity space was developed with the PIBBSS horizon scanning team, Dmitry Vaintrob and Lucas Teixeira. Motivation and Context In physics, renormalization is used to coarse-grain theoretical descriptions of complex interactions to focus on those that are most relevant for describing physical reality. Here, ‘reality’ is tied to the scale of interest; just as you don’t need to take quantum effects into account to design a safe bridge, the physical descriptions of the same system are different (in a way: emergent) when viewed close up versus far away. To put it differently, renormalization plays two main roles: * To organize a system into “effective” theories at different scales, and * To decouple physical systems into a hierarchy of theories of local interactions by systematically finding so-called “relevant” parameters as you vary the interaction scale. The running of parameters along this scale defines a so-called “RG flow”, which dynamically transforms systems in a way that throws away fine-grained details while preserving coarse-grained behavior. Like field theories in physics, NNs are highly complex systems with many interacting components. There is evidence that they organize information to learn, for ex