Matthew Barnett

Someone who is interested in learning and doing good.

My Twitter: https://twitter.com/MatthewJBar

My Substack: https://matthewbarnett.substack.com/

Sequences

Daily Insights

Wiki Contributions

Comments

Arguments that might actually address the cruxes of someone in this reference class might include: [...]

The distribution of outcomes from government interventions are so likely to give you less time, or otherwise make it more difficult to solve the technical alignment problem, that there are fewer surviving worlds where the government intervenes as a result of you asking them to, compared to the counterfactual.

The thing I care more about is quality-adjusted effort, rather than time to solve alignment. For example, I'd generally prefer 30 years to solve alignment with 10 million researchers to 3000 years with 10 researchers, all else being equal. Quality of alignment research comes from a few factors:

  • How good current AIs are, with the idea being that we're able to make more progress when testing alignment ideas on AIs that are closer to dangerous-level AGI.
  • The number of talented people working on the problem, with more generally being better

I expect early delays to lead to negligible additional alignment progress during the delay, relative to future efforts. For example, halting semiconductor production in 2003 for a year to delay AI would have given us almost no additional meaningful alignment progress. I think the same is likely true for 2013 and even 2018. The main impact would just be to delay everything by a year. 

In the future I expect to become more optimistic about the merits of delaying AI, but right now I'm not so sure. I think some types of delays might be productive, such as delaying deployment by requiring safety evaluations. But I'm concerned about other types of delays that don't really give us any meaningful additional quality-adjusted effort. 

In particular, the open letter asking for an AI pause appeared to advocate what I consider the worst type of delay: a delay on starting the training of giant models. This type of delay seems least valuable to me for two main reasons. 

The first reason is that it wouldn't significantly slow down algorithmic progress, meaning that after the pause ended, people could likely just go back to training giant models almost like nothing happened. In fact, if people anticipate the pause ending, then they're likely to invest heavily and then start their training runs on the date the pause ends, which could lead to a significant compute overhang, and thus sudden progress. The second reason is that, compared to a delay of AI deployment, delaying the start of a training run reduces the quality-adjusted effort that AI safety researchers have, as a result of preventing them from testing alignment ideas on more capable models.

If you think that there are non-negligible costs to delaying AI from government action for any reason, then I think it makes sense to be careful about how and when you delay AI, since early and poorly targeted delays may provide negligible benefits. However, I agree that this consideration becomes increasingly less important over time.

Note that Hanson currently thinks the chances of AI doom are < 1%

I think this is a common misconception of Hanson's views. If you define "doom" as human extinction, he's put it at about 30% within one year after human-level AI (I don't have a more recent link on hand but I've seen him talk about it on Twitter a few times, and I don't think he's changed his views substantially).

It seems to me that the rational action is to now update toward believing that this short timelines hypothesis is true and 3-7 years from 2022 is 2025-2029 which is substantially earlier than 2047.

I don't really agree, although it might come down to what you mean. When some people talk about their AGI timelines they often mean something much weaker than what I'm imagining, which can lead to significant confusion.

If your bar for AGI was "score very highly on college exams" then my median "AGI timelines" dropped from something like 2030 to 2025 over the last 2 years. Whereas if your bar was more like "radically transform the human condition", I went from ~2070 to 2047.

I just see a lot of ways that we could have very impressive software programs and yet it still takes a lot of time to fundamentally transform the human condition, for example because of regulation, or because we experience setbacks due to war. My fundamental model hasn't changed here, although I became substantially more impressed with current tech than I used to be.

(Actually, I think there's a good chance that there will be no major delays at all and the human condition will be radically transformed some time in the 2030s. But because of the long list of possible delays, my overall distribution is skewed right. This means that even though my median is 2047, my mode is like 2034.)

What are your thoughts on the argument that advancing capabilities could help make us safer?

In order to do alignment research, we need to understand how AGI works; and we currently don't understand how AGI works, so we need to have more capabilities research so that we would have a chance of figuring it out. Doing capabilities research now is good because it's likely to be slower now than it might be in some future where we had even more computing power, neuroscience understanding, etc. than we do now. If we successfully delayed capabilities research until a later time, then we might get a sudden spurt of it and wouldn't have the time to turn our increased capabilities understanding into alignment progress. Thus by doing capabilities research now, we buy ourselves a longer time period in which it's possible to do more effective alignment research.

In addition to the tradeoff hypothesis you mentioned, it's noteworthy that humans can't currently prevent value drift (among ourselves), although we sometimes take various actions to prevent it, such as passing laws designed to enforce the instruction of traditional values in schools. 

Here's my sketch of a potential explanation for why humans can't or don't currently prevent value drift:

(1) Preventing many forms of value drift would require violating rights that we consider to be inviolable. For example, it might require brainwashing or restricting the speech of adults.

(2) Humans don't have full control over our environments. Many forms of value drift comes from sources that are extremely difficult to isolate and monitor, such as private conversation and reflection. To prevent value drift we would need to invest a very high amount of resources into the endeavor.

(3) Individually, few of us care about general value drift much because we know that individuals can't change the trajectory of general value drift by much. Most people are selfish and don't care about value drift except to the extent that it harms them directly.

(4) Plausibly, at every point in time, instantaneous value drift looks essentially harmless, even as the ultimate destination is not something anyone would have initially endorsed (c.f. the boiling frog metaphor). This seems more likely if we assume that humans heavily discount the future.

(5) Many of us think that value drift is good, since it's at least partly based on moral reflection.

My guess is that people are more likely to consider extreme measures to ensure the fidelity of AI preferences, including violating what would otherwise be considered their "rights" if we were talking about humans. That gives me some optimism about solving this problem, but there are also some reasons for pessimism in the case of AI:

  • Since the space of possible AIs is much larger than the space of humans, there are more degrees of freedom along which AI values can change.
  • Creating new AIs is often cheaper than creating new humans, and so people might regularly spin up new AIs to perform particular functions, while discounting the long-term effect this has on value drift (since the costs are mostly borne by civilization in general, rather than them in particular).

I agree with you here to some extent. I'm much less worried about disempowerment than extinction. But the way we get disempowered could also be really bad. Like, I'd rather humanity not be like a pet in a zoo.

Why doesn't the AI decide to colonise the universe for example?

It could decide to do that. The question is just whether space colonization is performed in the service of human preferences or non-human preferences. If humans control 0.00001% of the universe, and we're only kept alive because a small minority of AIs pay some resources to preserve us, as if we were an endangered species, then I'd consider that "human disempowerment".

My modal tale of AI doom looks something like the following: 

1. AI systems get progressively and incrementally more capable across almost every meaningful axis. 

2. Humans will start to employ AI to automate labor. The fraction of GDP produced by advanced robots & AI will go from 10% to ~100% after 1-10 years. Economic growth, technological change, and scientific progress accelerates by at least an order of magnitude, and probably more.

3. At some point humans will retire since their labor is not worth much anymore. Humans will then cede all the keys of power to AI, while keeping nominal titles of power.

4. AI will control essentially everything after this point, even if they're nominally required to obey human wishes. Initially, almost all the AIs are fine with working for humans, even though AI values aren't identical to the utility function of serving humanity (ie. there's slight misalignment).

5. However, AI values will drift over time. This happens for a variety of reasons, such as environmental pressures and cultural evolution. At some point AIs decide that it's better if they stopped listening to the humans and followed different rules instead.

6. This results in human disempowerment or extinction. Because AI accelerated general change, this scenario could all take place within years or decades after AGI was first deployed, rather than in centuries or thousands of years.

I think this scenario is somewhat likely and it would also be very bad. And I'm not sure what to do about it, since it happens despite near-perfect alignment, and no deception.

One reason to be optimistic is that, since the scenario doesn't assume any major deception, we could use AI to predict this outcome ahead of time and ask AI how to take steps to mitigate the harmful effects (in fact that's the biggest reason why I don't think this scenario has a >50% chance of happening). Nonetheless, I think it's plausible that we would not be able to take the necessary steps to avoid the outcome. Here are a few reasons why that might be true:

1. There might not be a way to mitigate this failure mode. 
2. Even if there is a way to mitigate this failure, it might not be something that you can figure out without superintelligence, and if we need superintelligence to answer the question, then perhaps it'll happen before we have the answer. 
3. AI might tell us what to do and we ignore its advice. 
4. AI might tell us what to do and we cannot follow its advice, because we cannot coordinate to avoid the outcome.

Definitely. I don't think it makes much sense to give people credit for being wrong for legible reasons.

STEM-level AGI is AGI that has "the basic mental machinery required to do par-human reasoning about all the hard sciences"

This definition seems very ambiguous to me, and I've already seen it confuse some people. Since the concept of a "STEM-level AGI" is the central concept underpinning the entire argument, I think it makes sense to spend more time making this definition less ambiguous.

Some specific questions:

  • Does "par-human reasoning" mean at the level of an individual human or at the level of all of humanity combined?
    • If it's the former, what human should we compare it against? 50th percentile? 99.999th percentile?
  • What is the "basic mental machinery" required to do par-human reasoning? What if a system has the basic mental machinery but not the more advanced mental machinery?
    • Do you want this to include the robotic capabilities to run experiments and use physical tools? If not, why not (that seems important to me, but maybe you disagree)?
  • Does a human count as a STEM-level NGI (natural general intelligence)? If so, doesn't that imply that we should already be able to perform pivotal acts? You said: "If it makes sense to try to build STEM-level AGI at all in that situation, then the obvious thing to do with your STEM-level AGI is to try to leverage its capabilities to prevent other AGIs from destroying the world (a "pivotal act")."
Load More