Recursive self-improvement is already here.
This point is far from original. It’s been described before, for instance here, in Drexler’s Reframing Superintelligence, and (as I was working on this post) in Jack Clark’s newsletter and even by Yann LeCun. But sometimes I still hear people talk about preparing for “when recursive self-improvement kicks in,” implying that it hasn’t already.
The kinds of recursive self-improvement mentioned here aren’t exactly the frequently-envisioned scenario of a single AI system improving itself unencumbered. They instead rely on humans to make them work, and humans are inevitably slow and thus currently inhibit a discontinuous foom scenario.
It may be tempting to dismiss the kind of recursive self-improvement happening today as not real recursive self-improvement. To think about it as some future event that will start to happen that we need to prepare for. Yes, we need to prepare for increasing amounts of it, but it’s not in the future, it’s in the present.
Here are some currently existing examples (years given for the particular example linked):
- (2016) Models play against themselves in order to iteratively improve their performance in games, most notably in AlphaGo and its variants.
- (2016) Some neural architecture search techniques use one neural network to optimize the architectures of different neural networks.
- (2016) AI is being used to optimize data center cooling, helping reduce the cost of further scaling.
- (2021) Code generation tools like GitHub Copilot can be helpful to software engineers, including presumably some AI research engineers (anecdotally, I’ve found it helpful when doing engineering). Engineers may thus be faster at designing AI systems, including Copilot-like systems.
- (2021) Google uses deep reinforcement learning to optimize their AI accelerators.
- (2022) Neural networks, running on NVIDIA GPUs, have been used to design more efficient GPUs which can in turn run more neural networks.
- (2022) Neural networks are being used for compiler optimization in the popular LLVM compiler language, which Pytorch’s just-in-time compiler is based on.
Inspired by Victoria Krakovna’s specification gaming spreadsheet, I’ve made a spreadsheet here with these examples. Feel free to submit more here. I think the number of examples will continue to grow, making it useful to keep track of them.
If this feels underwhelming compared with the kinds of recursive self-improvement often written about, you’re right. But consider that the start of an exponential often feels underwhelming. As time goes on, I expect that humans will become less and less involved in the development of AI, with AI automating more and more of the process. This could very well feel sudden, but it won’t be unprecedented: it’s already begun.
It's worth noting that the examples shown here are in line with most continuous models of AI progress. In most continuous models, AI-driven improvements first start small, with AI contributing a little bit to the development process. Over time, AI will contribute more and more to the process of innovation in AI, until they're contributing 60% of the improvements, then 90%, then 98%, then 99.5%, and then finally all of the development happens through AI, and humans are left out of the process entirely.
I don't know whether most people who believe in hard takeoff would say that these examples violate their model (probably not), but at the very least, these observations are well-predicted by simple continuous models of AI takeoff.
Yes, this is exactly what I had in mind!
I feel like it should somewhat discount the hard takeoff model, but then again I’m not sure how hard takeoff people would have predicted the initial curve to look like here.
On behalf of hard takeoff people (and as someone who is like 50% one of them) the hard takeoff model predicts this stuff pretty much just as well as the "continuous models," i.e. is pretty much zero surprised by these data points.
(I put continuous in scare quotes because IMO it's a rhetorical weasel word that invites motte-and-bailey tactics -- the motte being "surely the burden of proof should be on whoever thinks the straight line on a graph will suddenly break or bend" and the bailey being "therefore the burden of proof is on whoever thinks that there won't be a multi-year period in which the world is going crazy due to powerful AGIs transforming the economy while still humans are in control because the AGIs aren't superhuman yet." I prefer the slow vs. fast takeoff terminology, or soft vs. hard.)
I'm a bit confused by your response. First, the meat of the argument:
You are implicitly comparing two models: Mfast and Mslow, which make predictions about the world. Each model makes several claims, including the shape of the function governing AI improvement and about how the shape of that function comes about. So far as I can tell, a typical central claim of people who endorse Mfast is that AIs working on themselves will allow their capabilities to grow hyper-exponentially. Those who endorse Mslow don't seem to dispute that self-improvement will occur, but expect it to be par for the course of a new technology and continue to be well modeled by exponential growth.
So, it seems to me that the existence of recursive self-improvement without an observed fast takeoff is evidence against Mfast. I presume you disagree, but I don't see how from a model selection framework. Mfast predicts either the data we observe now or a fast takeoff, whereas Mslow predicts only the exponential growth we are currently observing (do you disagree that we're in a time of exponential growth?). By the laws of probability, Mslow places higher probability on the current data than Mfast. Due to Bayes' rule, Mslow is therefore favored by the existing evidence (i.e. the Bayes factor indicates that you should update towards Mslow). Now, you might have a strong enough prior that you still favor Mfast, but if your model placed less probability mass on the current data than another model, you should update towards that other model.
Second (and lastly), a quibble:
Yitz's response uses the terms hard/soft takeoff, was that edited? Otherwise your argument against "continuous" as opposed to slow or soft comes off as a non-sequitor; that you're battling for terminological ground that isn't even under contention.
Different people will have different versions of each of these models. Some may even oscillate between them as is convenient for argumentative purposes (a-la motte and bailey).
Note that those who endorse Mslow don't think exponential growth will cut it; it'll be much faster than that (in line with the long-term trends in human history, which are faster than exponential). I'm thinking of e.g. Paul Christiano and Ajeya Cotra here who I'm pretty sure agree growth has been and will continue to be superexponential (the recent trend of apparent exponential growth being an aberration).
My complaining about the term "continuous takeoff" was a response to Matthew Barnett and others' usage of the term, not Yitz', since as you say Yitz didn't use it.
Anyhow, to the meat: None of the "hard takeoff people" or hard takeoff models predicted or would predict that the sorts of minor productivity advancements we are starting to see would lead to a FOOM by now. Ergo, it's a mistake to conclude from our current lack of FOOM that those models made incorrect predictions.
The hard takeoff models predict that there will be less AI-caused productivity advancements before a FOOM than soft takeoff models. Therefore any AI-caused productivity advancements without FOOM are relative evidence against the hard takeoff models.
You might say that this evidence is pretty weak; but it feels hard to discount the evidence too much when there are few concrete claims by hard-takeoff proponents about what advances would surprise them. Everything is kinda prosaic in hindsight.
I'm not sure about that actually. Hard takeoff and soft takeoff disagree about the rate of slope change, not about the absolute height of the line. I guess if you are thinking about the "soft takeoff means shorter timelines" then yeah it also means higher AI progress prior to takeoff, and in particular predicts more stuff happening now. But people generally agree that despite that effect, the overall correlation between short timelines and fast takeoff is positive.
Anyhow, even if you are right, I definitely think the evidence is pretty weak. Both sides make pretty much the exact same retrodictions and were in fact equally unsurprised by the last few years. I agree that Yudkowsky deserves spanking for not working harder to make concrete predictions/bets with Paul, but he did work somewhat hard, and also it's not like Paul, Ajeya, etc. are going around sticking their necks out much either. Finding concrete stuff to bet on (amongst this group of elite futurists) is hard. I speak from experience here, I've talked with Paul and Ajeya and tried to find things in the next 5 years we disagree on and it's not easy, EVEN THOUGH I HAVE 5-YEAR TIMELINES. We spent about an hour probably. I agree we should do it more.
(Think about you vs. me. We both thought in detail about what our median futures look like. They were pretty similar, especially in the next 5 years!)
Good post. And a list/spreadsheet is a good idea.
It's possible that improvement in self-improvement over time will "feel" exponential. It's also possible it'll feel more dramatic -- maybe at some point we'll cross the threshold where investing one resource-unit quickly gives more than one resource-unit in return and this process is highly scalable, and then FOOM. Perhaps we should call the kind of recursive self-improvement that currently exists "subcritical" or something to distinguish it. (Edit: but note "subcritical" builds in assumptions of threshold-y-ness.)
If you want more, you can probably dig some out of https://www.reddit.com/r/reinforcementlearning/search?q=flair%3AMetaRL&restrict_sr=on&include_over_18=on and https://www.gwern.net/Tool-AI
Really excited to see this list, thanks for putting it together! I shared it with the DM safety community and tweeted about it here, so hopefully some more examples will come in. (Would be handy to have a short URL for sharing the spreadsheet btw.)
I can see several ways this list can be useful:
Curious whether you primarily intend this to be an outreach tool or a resource for AI forecasting / governance.
Thanks! Here is a shorter url: rsi.thomaswoodside.com.
I intended it as somewhat of an outreach tool, though probably to people who already care about AI risk, since I wouldn't want it to serve as a reason for somebody to start working on it more because they see it's possible.
Mostly, I'm did it for epistemics/forecasting: I think it will be useful for the community to know how this particular kind of work is progressing, and since it's in disparate research areas I don't think it's being tracked by the research community by default.
Makes sense, thanks. I think the current version of the list is not a significant infohazard since the examples are well-known, but I agree it's good to be cautious. (I tweeted about it to try to get more examples, but it didn't get much uptake, happy to delete the tweet if you prefer.) Focusing on outreach to people who care about AI risk seems like a good idea, maybe it could be useful to nudge researchers who don't work on AI safety because of long timelines to start working on it.
No need to delete the tweet. I dagree the examples are not info hazards, they're all publicly known. I just probably wouldn't want somebody going to good ML researchers who currently are doing something that isn't really capabilities (e.g., application of ML to some other area) and telling them "look at this, AGI soon."