AI Probability Trees - Katja Grace

Nathan Young

I am interviewing AI experts on what they think will happen with AI. Here is Katja Grace and her thoughts. AI risk scares me but often I feel pretty disconnected from it. This has helped me think about it.

Here are Katja’s thoughts in brief:

Will we get AI tools that can run a civil service department with 100x fewer people by 2040? 50%
In what fraction of worlds are the majority of AI tools controlled by agentic AI? 70%
In what fraction of worlds will agentic AI achieve their goals? 90%
In what fraction of agentic worlds will the majority of AI controlled resources be implemented for bad goals? 50%
Will policy slow AGI down by 10 years? 25%, How much reduction of badness does 10 years buy us? 30%

You can look at an interactive graph here:

or see all the graphs I’ve done so far https://estimaker.app/ai. There is an image at the bottom of this page.

You can watch the full video here:

Longer explanations

Please remember this is by 2040. This was originally a table

Question
- Katja's Commentary
- My commentary
Will we get AI tools that can run a civil service department with 100x fewer people by 2040? (50%)
- Katja's commentary: Katja says she’s unsure and gives 50% accordingly.
- My commentary:
  - A common failure mode in my discussions about AGI is how vague the capabilities that make “AGI” are. Some people are envisaging tools capable of flawlessly running worldwide conspiracies while others seem to think of top human level coding ability.
  - Katja and I settled on this example because it involves a varied set of tasks at scale and replacing a lot of human labour.
  - The tasks include:
    - Large scale engineering projects
    - Cash transfers
    - Real world disagreements (how to set tax rates, etc)
    - Complex system problems (traffic flows, healthcare takeup)
  - Civil Services act on a scale of millions of people.
In what fraction of those worlds does agentic ever control the majority of resources (even via AI tools) (70%) If you add up the resources controlled by agentic AI and the AI tools they control, is that more than half?
- Katja's commentary:
  - Katja’s intuition is that for extinction, agency is probably necessary.
  - “The reason I think that some of it is probably agentic is that it seems like agents or something roughly agent-like are economically valuable. People want to make things like agents. They are already trying to make things like agents.”
  - “It is a question of how agentic they are, also it seems to be more like a spectrum than being able to clearly say this is agentic or not”
  - Katja “I don't know, somewhere between 60 and 90%. … I think a world I could imagine is, like there are definitely some really impressive agentic AI systems. But like, they're not really the bulk of the AI thinking that's going on. Like a bunch of it is sort of more like tools that largely humans are using… I feel like the question is kind of who's wielding most of the AI cognitive labour. Is it humans or AIs with good goals or AIs with bad goals?"
- My commentary:
  - Does anyone have a better frame for this? I think Katja’s is the best I’ve heard but is pretty vague.
  - Consider how the top of the US government governs not only the US but countries allied to the US and affects norms worldwide. Similarly AI agents may control not only some resources and AI tools but what those tools control. The question is ~ is this a majority. Do they have a monopoly on the market for power?
  - I’d appreciate a better operationalisation of this question.
In what fraction of those worlds will agentic AI achieve their goals? (90%)
- Katja's commentary:
  - “There's like .1% to 10% chance of a thing happening very fast. Then if that doesn't happen, maybe like an 80% chance that in the long run, if there are AI agents around who are much smarter than us, that they take all the power.”
  - Notably Katja has a model where agentic AI’s achieve their goals without takeover - they are more competent and we slowly hand them power.
  - “I guess [in another] case [AI takeover] happens very slowly. I think maybe I'm sort of agnostic on actually how slowly that is. And it's more like the kind of mechanism that I'm pointing at where it could be quite quick, but… no one's like violating anyone's property rights. It's just, they're pretty competent and get paid for what they do, perhaps, or… all decisions are being made by AI because AI is just more competent at making decisions.”
  - But this might not be good. “I think there are also cases where even the people interacting are maybe not happy with it, but competition forces you to. Like, if everyone can use some kind of AI system to manage their company's marketing, let's say, And I'd say it's sort of known by everyone that it's going to be kind of dodgy at this and do some bad things. And they would rather that it didn't… But if you don't use it, then your company is not going to be able to compete. So you sort of have to use it.”
- My commentary:
  - This part of the story seems pretty similar to the doomiest models - if you have agentic AI that controls the majority of resources and no good policy (we’ll get to that later), they probably achieve their goals. At that point you have to just hope their goals are good.
  - I’d be curious if anyone who has a more pessimistic take than Katja disagrees with this bit.
In what fraction of agentic worlds will the majority of AI controlled resources be implemented for bad goals? (50%)
- Katja's commentary:
  - Katja originally went for 25% but felt that made the outputs too low overall so shifted to 50%.
  - In the talk at EAG SF you can see her explain some of the reasoning behind these kinds of numbers. Her understanding of other thinkers is that they think the space of all possible values is very large and so it’s likely we miss the kinds of values we are aiming for when creating AGI.
  - There seem several good reasons to have different intuitions here
    - Katja has an intuition that current AI is surprisingly good at finding humanlike things within wide search spaces. Eg generating humanlike faces or text
    - Humans are good at this too - finding cars in the space of all cars. In the same way they may be able to repeatedly create AI with humanlike goals among a large search space
    - Financial incentives will lead towards finding useful and valuable things. We want to get AI to do the kinds of things we would want
- My commentary:
  - People have different intuitions on this question. Some people think human preferences are very easy to achieve, either via individual models or natural equilibria. Others think that’s a very small target in the space of all possible goals and that we are unlikely to hit it but that AIs will just enact very alien and harmful preferences.
  - I think this is generally what is pointed at by the orthogonality thesis.
Will policy slow AGI down by 10 years? How much reduction of badness does 10 years buy us? (25%, 30%)
- Katja's commentary:
  - “I just think that policy has slowed things down a lot in the past. Like, I think you can look at other technologies that have just been slowed down quite a lot… Various genetic engineering of humans, all kinds of human reproductive things, I think, are happening pretty slowly relative to the economic value you could get from them… It seems plausible that we could already have put huge amounts of money and figured out some more sort of genetic or cloning type stuff. And we've decided as a world that we don't… [And] this is true across the world. It's not like China is doing this much faster than America or something like that. I think also medicine in general is a good example where you forget… just how slowly things happen, even when they seem like they might be pretty valuable for people”
- My commentary:
  - A chunk of people think that policy is very relevant to slowing down/improving AI risk. This is part of an attempt to have a graph where everyone can put in their true values and get results that seem right to them.

How this falls out by 2040

Here is how these numbers cash out in possible worlds. These are Mutually Exclusive and Collectively Exhaustive (MECE)

AI are good but not godlike (50%)

AI tools are great. Maybe they can code a lot or support a lot. But they can’t reduce the manpower required to run a department of the Civil Service by 100x. They can’t take on large projects alone, for some reason. It’s like GPT4, but a lot better, but not a step change.

ChatGPT20 - competence but no agency (15%)

Imagine a ChatGPT that can produce anything you ask of it but only does a few tasks or can’t call itself recursively. Unlike the above this is genuinely a step change. You or I could run a hedge fund or a chunk of government. But it will involve us doing the vision.

Many godlike agentic AI systems blocking one another (4%)

As in the current world, many intelligent systems (people and companies) are trying to reach their outcomes and blocking one another. Somehow this doesn’t lead to the “mid/utopia’s” below.

AI Midtopia/Utopia (16%)

These are the really good scenarios where we have agentic AGI that doesn’t want bad things. There are a broad spread of possible worlds, from some kind of human uplift, to a kind of superb business as usual where we might still have much to complain about but everyone lives like the richest people do today.

Saved by policy (12%)

Amount of worlds where things would have gone really badly but policy delayed things. These might be any of the other non-doom worlds - perhaps AI has been slowed a lot or perhaps it has better goals. In order to simplify the graph, it doesn’t really deal with what these worlds look like. Please make suggestions

Doom (15%)

Unambiguously bad outcomes. Agentic AGI which wants things we’d consider bad and gets it. My sense is that Katja thinks that most bad outcomes come from AGI taking over, maybe 10% it happening quickly and 90% it happening slowly.

If you would like to see more about this, Katja has much longer explanations here: https://wiki.aiimpacts.org/doku.php?id=arguments_for_ai_risk:is_ai_an_existential_threat_to_humanity:start

Who would you like to see this done for?

I wanted to see work like this, so I thought I’d do it. If you want to see a specific person’s AI risk model, perhaps ask them to talk to me. It takes about 90 minutes of their time and currently I think the marginal gains of every subsequent one are pretty high.

On a more general level, I am pretty encouraged by positive feedback. Should I try to get funding to do more interviews like this?

How could this be better?

We are still in early stages so I appreciate a lot of nitpicky feedback

Thanks

Thanks to Katja Grace for the interview and Rebecca Hawkins for feedback, in particular for suggesting the table layout and to Arden Koehler for good comments (you should read her thread on writing good comments). Thanks to the person who suggested I write this sequence