In 1918, Emmy Noether published her famous theorem showing that each symmetry of the laws of physics implies a corresponding conserved quantity. Laws which remain the same even if we move the whole universe left or right a little result in conservation of momentum, laws which remain the same over time result in conservation of energy, and so forth.
At the time, Noether’s Theorem was only proven for the sorts of systems used in classical physics - i.e. a bunch of differential equations derived by minimizing an “action”. Over the next few decades, the foundational paradigm shifted from classical to quantum, and Noether’s original proof did not carry over. But the principle - the idea that symmetries imply conserved quantities - did carry over. Indeed, the principle is arguably simpler and more elegant in quantum mechanics than in classical.
This is the sort of thing I look for in my day-to-day research: principles which are simple enough, fundamental enough, and general enough that they’re likely to carry over to the next paradigm. I don’t know what the next paradigm will be, yet; the particulars of a proof or formulation of a problem might end up obsolete. But I look for principles which I expect will survive, even if the foundations shift beneath them.
My own day-to-day research focuses on modelling abstraction.
I generally build these models on a framework of probability, information theory, and causal models. I know that this framework will not cover all of abstraction - for example, it doesn’t cover mathematical abstractions like “addition” or “linearity”. Those abstractions are built into the structure of logic, and probability theory takes all of logic as given. There may be some way in which the abstraction of linearity lets me answer some broad class of questions more easily, but standard probability and information theory ignore all that by just assuming that all pure-logic questions are answered for free.
… yet I continue to use this probability/information/causality framework, rather than throwing it away and looking for something more general on which to build the theory. Why? Well, I expect that this framework is general enough to figure out principles which will carry over to the next paradigm. I can use this framework to talk about things like “throwing away information while still accurately answering queries” or “information relevant far away” or “massively redundant information”, I can show that various notions of “abstraction” end up equivalent, I can mathematically derive the surprising facts implied by various assumptions. For instance, I can prove the Telephone Theorem: when transmitted over a sufficiently long distance, all information is either completely lost or arbitrarily perfectly conserved. I expect a version of that principle to carry over to whatever future paradigm comes along, even after the underlying formulations of “information” and “distance” change.
One obvious alternative to looking for such principles is to instead focus on the places where my current foundational framework falls short, and try to find the next foundational framework upfront. Jump right to the next paradigm, as quickly as possible.
The main reason not to do that is that I don’t think I have enough information yet to figure out what the next paradigm is.
Noether’s Theorem and principles like it played a causal role in figuring out quantum mechanics. It was the simple, general principles of classical mechanics which provided constraints on our search for quantum mechanical laws. Without those guideposts, the search space of possible physical laws would have been too wide.
Special relativity provides a particularly clear example here. Nobody would have figured it out without the principles of electrodynamics and Lorentz transformations to guide the way. Indeed, Einstein’s contribution was “just” to put an interpretation on math which was basically there already.
More generally, knowing a few places where the current framework fails is not enough to tell us what the next framework should be. I know that my current foundation for thinking about abstraction is too narrow, but the search space of possible replacements is still too wide. I want simple general principles, principles which capture the relevant parts which I do think I understand, in order to guide that search. So, in my day-to-day I use the framework I have - but I look for the sort of principles which I expect to generalize to the next framework, and which can guide the search for that next framework.
This leaves a question: how do we know when it’s time to make the jump to the next paradigm? As a rough model, we’re trying to figure out the constraints which govern the world. Sometimes, the rate-limiting step might be figuring out new constraints, to limit our search. Sometimes, the rate-limiting step might be abandoning (probably implicit) wrong constraints already in our models, like the assumption of Galilean relativity implicitly built into pre-special-relativity physics. When finding new constraints is the rate-limiting step, it should feel like exploring a wide-open space, like we’re looking around and noticing patterns and finding simple ways to describe those patterns. When abandoning wrong constraints is the rate-limiting step, it should feel like the space is too constrained, like different principles or examples come into conflict with each other.
On the other end of the spectrum, some people argue for just working within the current paradigm and forgetting about the next one. This is a long-term/short-term tradeoff: in the short term, the current paradigm is usually the best we have; building new frameworks takes time. So if our goals are short term - like, say, a startup which needs to show growth in the next six months - then maybe we should just do what we can with what we have.
There are definitely lots of places where this is the right move. On the other hand, I think “long term” is often much, much shorter than people realize.
I worked in startups for about five years. Usually, the companies I was at needed to show results or shut down within ~2 years. On the other hand, the code we wrote usually turned over within a year - the company would pivot or the UI design would change or the code architecture and tech stack would shift, and old code would either be deprecated or rewritten. In that environment, “building for the next paradigm” meant figuring out principles which would leave us better off a year from now, when the current code had mostly turned over. For instance, knowledge about our users (often from A/B tests), typically had lasting value. Sometimes, a smart library design would last. With a runway of ~2 years and a turnover time of ~1 year, the right move is to usually to spend that first year on things which will make us better off a year from now after everything has turned over.
… not that we always did that, mind you, but it was the things which lasted through turnover which were consistently the most important in hindsight. And after five years of this, one can see the patterns in what kinds of things will last.
AI research (and alignment research) in particular is a place where the “long term” is much, much shorter than many people realize. Not in the sense that AGI is right around the corner, but in the sense that the next paradigm is less than 5 years away, not more than 20. Just within the past 10 years, we saw the initial deep learning boom with image classifiers, then a shift to image generators (with the associated shift to GAN architectures), and then the shift to transformers and language models. Even if you think that transformer-based language models are the most probable path to AGI, there will still likely be major qualitative shifts along the way. If we’re doing work which is narrowly adapted to the current paradigm, it’s likely to be thrown out, and probably not even very far in the future.
The work done by Chris Olah’s team is a good example here. They did some really cool work on generative image nets. Then the shift to transformers came along, and they recently restarted from roughly square zero on transformer nets. Presumably some illegible skills transferred, but they mostly seem to be figuring things out from scratch, as far as I can tell. When the next shift comes, I expect they’ll be back at roughly square zero again. My advice to someone like Chris Olah would be: figure out the principles which seem likely to generalize. At a bare minimum, look for principles or tools which are useful for both image and text models, both CNNs and transformers. Those are the principles which are likely to still be relevant in 5 years.
As an 80/20 solution, I think it’s usually fine to trust your instincts on this one. The important step is just to actually ask yourself whether something will carry over. I can look at my own work and say “hmm, this specific notion of ‘redundant information’ probably won’t carry over, but some general notion of ‘abstractions summarize massively redundant information’ probably will, and the principles I've derived from this model probably will”. Similarly, I expect someone in 1920 could look at Noether’s Theorem and think “wow, even if the foundations of physics are totally overturned, I bet some version of this principle will survive”.
If you want a more legible answer than that, then my advice is to introspect on what information is driving your intuitions about what will or will not carry over. I intend to do that going forward, and will hopefully figure out some patterns. For now, simplicity and generality seem like the main factors.
Great post! I don't think Chris Olah's work is a good example of non-transferable principles though. His team was able to make a lot of progress on transformer interpretability in a relatively short time, and I expect that there was a lot of transfer of skills and principles from the work on image nets that made this possible. For example, the idea of circuits and the "universality of circuits" principle seems to have transferred to transformers pretty well.
I liked this post a lot, and I think its title claim is true and important.
One thing I wanted to understand a bit better is how you're invoking 'paradigms' in this post wrt AI research vs. alignment research. I think we can be certain that AI research and alignment research are not identical programs but that they will conceptually overlap and constrain each other. So when you're talking about 'principles that carry over,' are you talking about principles in alignment research that will remain useful across various breakthroughs in AI research, or are you thinking about principles within one of these two research programs that will remain useful across various breakthroughs within that research program?
Another thing I wanted to understand better was the following:
This leaves a question: how do we know when it’s time to make the jump to the next paradigm? As a rough model, we’re trying to figure out the constraints which govern the world.
Unlike many of the natural sciences (physics, chemistry, biology, etc.) whose explicit goals ostensibly are, as you've said, 'to figure out the constraints which govern the world,' I think that one thing that makes alignment research unique is that its explicit goal is not simply to gain knowledge about reality, but also to prevent a particular future outcome from occurring—namely, AGI-induced X-risks. Surely a necessary component for achieving this goal is 'to figure out the [relevant] constraints which govern the world,' but it seems pretty important to note (if we agree on this field-level goal) that this can't be the only thing that goes into a paradigm for alignment research. That is, alignment research can't only be about modeling reality; it must also include some sort of plan for how to bring about a particular sort of future. And I agree entirely that the best plans of this sort would be those that transcend content-level paradigm shifts. (I daresay that articulating this kind of plan is exactly the sort of thing I try to get at in my Paradigm-building for AGI safety sequence!)
So when you're talking about 'principles that carry over,' are you talking about principles in alignment research that will remain useful across various breakthroughs in AI research, or are you thinking about principles within one of these two research programs that will remain useful across various breakthroughs within that research program?
Good question. Both.
alignment research can't only be about modeling reality; it must also include some sort of plan for how to bring about a particular sort of future
Imagine that we're planning a vacation to Australia. We need to plan flights, hotels, and a rental car. Now someone says "oh, don't forget that we must include some sort of plan for how to get from the airport to the rental car center". And my answer to that would usually be... no, I really don't need to plan out how to get from the airport to the rental car center. That part is usually easy enough that we can deal with it on-the-fly, without having to devote significant attention to it in advance.
Just because a sub-step is necessary for a plan's execution, does not mean that sub-step needs to be significantly involved in the planning process, or even planned in advance at all.
Setting aside for the moment whether or not that's a good analogy for whether "alignment research can't only be about modeling reality", what are the criteria for whether it's a good analogy? In what worlds would it be a good analogy, and in what worlds would it not be a good analogy?
The key question is: what are the "hard parts" of alignment? What are the rate-limiting steps? What are the steps which, once we solve those, we expect the remaining steps to be much easier? The hard parts are like the flights and hotel. The rest is like getting from the airport to the rental car center: that's a problem which we expect will be easy enough that we don't need to put much thought into it in advance (and shouldn't bother to plan it at all until after we've figured out what flight we're taking). If the hard parts of alignment are all about modeling reality, then alignment research can, in principle, be only about modeling reality.
My own main model for the "hard part" of alignment is in the first half of this video. (I'd been putting off bringing this up in the discussion on your Paradigm-Building posts, because I was waiting for the video to be ready.)
I think of this as a fairly central post in the unofficial series on How to specialize in Problems We Don't Understand (which, in turn, is the post that most sums up what I think the art of rationality is for. Or at least the parts I'm most excited about).
I wonder if you think that the Telephone Principle that you mention might be of relevance both as an example and as a possible critique of the methodology you propose in this post. Transformation into a new paradigm might be understood as something like a 'long distance' in epistemic terms. Therefore, we would expect principles active in our current paradigm to be more or less arbitrarily conserved over that epistemic distance.
Do you think the 'distance' in the Telephone Principle can be equally well put into epistemic terms?
That is an interesting and potentially fruitful analogy. One natural notion of "long distance" would be carrying over into many future paradigms, through many paradigm shifts. The Telephone Principle would then say that any principle will either eventually carry over arbitrarily perfectly through each shift (possibly after adjusting a bit in the first few paradigm shifts), or will eventually be completely lost/abandoned.
Curated. Granted that my own reading on the topic is limited, but this feels like genuinely novel and helpful direction in how to conduct research. It pulls me out of asking "what is the paradigm?" and "do we even have a paradigm?" to instead get me to focus fruitfully on the research itself. I look forward to seeing how this advice works out for me (and others) going forward.