Matthew Barnett

Someone who is interested in learning and doing good.

My Twitter: https://twitter.com/MatthewJBar

My Substack: https://matthewbarnett.substack.com/

Sequences

Daily Insights

Wiki Contributions

Comments

I think many of the points you made are correct. For example I agree that the fact that all the instances of ChatGPT are copies of each other is a significant point against Drexler's model. In fact this is partly what my post was about.

I disagree that you have demonstrated the claim in question: that we're trending in the direction of having a single huge system that acts as a unified entity. It's theoretically possible that we will reach that destination, but GPT-4 doesn't look anything like that right now. It's not an agent that plots and coordinates with other instances of itself to achieve long-term goals. It's just a bounded service, which is exactly what Drexler was talking about.

Yes, GPT-4 is a highly general service that isn't very modular. I agree that's a point against Drexler, but that's also not what I was disputing.

I don't see what about that 2017 Facebook comment from Yudkowsky you find particularly prophetic.

Is it the idea that deep learning models will be opaque? But that was fairly obvious back then too. I agree that Drexler likely exaggerated how transparent a system of AI services would be, so I'm willing to give Yudkowsky a point for that. But the rest of the scenario seems kind of unrealistic as of 2023.

Some specific points:

  • The recursive self-improvement that Yudkowsky talks about in this scenario seems too local. I think AI self-improvement will most likely take the form of AIs assisting AI researchers, with humans gradually becoming an obsolete part of the process, rather than a single neural net modifying parts of itself during training.

  • The whole thing about spinning off subagents during training just doesn't seem realistic in our current paradigm. Maybe this could happen in the future, but it doesn't look "prophetic" to me.

  • The idea that models will have "a little agent inside plotting" that takes over the whole system still seems totally speculative to me, and I haven't seen any significant empirical evidence that this happens during real training runs.

  • I think gradient descent will generally select pretty hard for models that do impressive things, making me think it's unlikely that AIs will naturally conceal their abilities during training. Again, this type of stuff is theoretically possible, but it seems very hard to call this story prophetic.

I don't think algorithms will stay constant. Recursive AI R&D could speed up the rate of algorithmic progress too, but I mostly think that's just another "input" to the "AI production function". Since I already agree with you that AI automation could speed up the pace of AI progress, I'm not sure exactly what you disagree with. My claim was about sharp changes in output in response to small changes in inputs.

Sam Altman explicitly said that he thinks they are in a regime of diminishing returns from scaling compute and that the primary effort being put into the next version of GPT will be on finding algorithmic improvements.

Did he really say this? I thought he was talking about the size of models, not the size of training compute. It is expected under the Chinchilla scaling law that it will take a while for models to get much larger, mostly because we were under training them for a few years. I suspect that's what he was referring to instead.

ChatGPT-4 is more unified than one would have expected from reading Drexler's writing back in the day

GPT-4 is certainly more general than what existed years ago. Why is it more unified? When I talked about "one giant system" I meant something like a monolithic agent that takes over humanity. If GPT-N takes over the world, I expect it will be because there are millions of copies that band up together in a coalition, not because it will be a singular AI entity.

Perhaps you think that copies of GPT-N will coordinate so well that it's basically just a single monolithic agent. But while I agree something like that could happen, I don't think it's obvious that we're trending in that direction. This is a complicated question that doesn't seem clear to me given current evidence.

What is the vibe you're interpreting me as stating? I didn't mean that Drexler said that "systems with specialized goals will have only narrow knowledge". What I wrote was that I interpreted the CAIS world as one where we'd train a model from scratch for each task. The update that I'm pointing out is that the costs of automating tasks can be massively parallelized across tasks, not that AIs will have broad knowledge of the world.

I want to distinguish between a discontinuity in inputs, and a discontinuity in response to small changes in inputs. In that quote, I meant that I don't expect the generality of models to shoot up at some point as we scale from 10^25 FLOP to 10^26, 10^27 and so on, at least, holding algorithms constant. I agree that AI automation could increase growth, which would allow us to scale AI more quickly, but that's different from the idea that generality will suddenly appear at some scale, rather than appearing smoothly as we move through the orders of magnitude of compute.

My understanding is that the CAIS model is consistent with highly concentrated development, but it's not a necessary implication. The foundation models paradigm makes highly concentrated development nearly certain. Like I said in the post, I think we should see this as an update to the model, rather than a contradiction.

Arguments that might actually address the cruxes of someone in this reference class might include: [...]

The distribution of outcomes from government interventions are so likely to give you less time, or otherwise make it more difficult to solve the technical alignment problem, that there are fewer surviving worlds where the government intervenes as a result of you asking them to, compared to the counterfactual.

The thing I care more about is quality-adjusted effort, rather than time to solve alignment. For example, I'd generally prefer 30 years to solve alignment with 10 million researchers to 3000 years with 10 researchers, all else being equal. Quality of alignment research comes from a few factors:

  • How good current AIs are, with the idea being that we're able to make more progress when testing alignment ideas on AIs that are closer to dangerous-level AGI.
  • The number of talented people working on the problem, with more generally being better

I expect early delays to lead to negligible additional alignment progress during the delay, relative to future efforts. For example, halting semiconductor production in 2003 for a year to delay AI would have given us almost no additional meaningful alignment progress. I think the same is likely true for 2013 and even 2018. The main impact would just be to delay everything by a year. 

In the future I expect to become more optimistic about the merits of delaying AI, but right now I'm not so sure. I think some types of delays might be productive, such as delaying deployment by requiring safety evaluations. But I'm concerned about other types of delays that don't really give us any meaningful additional quality-adjusted effort. 

In particular, the open letter asking for an AI pause appeared to advocate what I consider the worst type of delay: a delay on starting the training of giant models. This type of delay seems least valuable to me for two main reasons. 

The first reason is that it wouldn't significantly slow down algorithmic progress, meaning that after the pause ended, people could likely just go back to training giant models almost like nothing happened. In fact, if people anticipate the pause ending, then they're likely to invest heavily and then start their training runs on the date the pause ends, which could lead to a significant compute overhang, and thus sudden progress. The second reason is that, compared to a delay of AI deployment, delaying the start of a training run reduces the quality-adjusted effort that AI safety researchers have, as a result of preventing them from testing alignment ideas on more capable models.

If you think that there are non-negligible costs to delaying AI from government action for any reason, then I think it makes sense to be careful about how and when you delay AI, since early and poorly targeted delays may provide negligible benefits. However, I agree that this consideration becomes increasingly less important over time.

Note that Hanson currently thinks the chances of AI doom are < 1%

I think this is a common misconception of Hanson's views. If you define "doom" as human extinction, he's put it at about 30% within one year after human-level AI (I don't have a more recent link on hand but I've seen him talk about it on Twitter a few times, and I don't think he's changed his views substantially).

It seems to me that the rational action is to now update toward believing that this short timelines hypothesis is true and 3-7 years from 2022 is 2025-2029 which is substantially earlier than 2047.

I don't really agree, although it might come down to what you mean. When some people talk about their AGI timelines they often mean something much weaker than what I'm imagining, which can lead to significant confusion.

If your bar for AGI was "score very highly on college exams" then my median "AGI timelines" dropped from something like 2030 to 2025 over the last 2 years. Whereas if your bar was more like "radically transform the human condition", I went from ~2070 to 2047.

I just see a lot of ways that we could have very impressive software programs and yet it still takes a lot of time to fundamentally transform the human condition, for example because of regulation, or because we experience setbacks due to war. My fundamental model hasn't changed here, although I became substantially more impressed with current tech than I used to be.

(Actually, I think there's a good chance that there will be no major delays at all and the human condition will be radically transformed some time in the 2030s. But because of the long list of possible delays, my overall distribution is skewed right. This means that even though my median is 2047, my mode is like 2034.)

Load More