Sam Clarke


Survey on AI existential risk scenarios

Thanks for the reply - a couple of responses:

it doesn't seem useful to get a feeling for "how far off of ideal are we likely to be" when that is composed of: 1. What is the possible range of AI functionality (as constrained by physics)? - ie what can we do?

No, these cases aren't included. The definition is: "an existential catastrophe that could have been avoided had humanity's development, deployment or governance of AI been otherwise". Physics cannot be changed by humanity's development/deployment/governance decisions. (I agree that cases 2 and 3 are included).

Knowing that experts think we have a (say) 10% chance of hitting the ideal window says nothing about what an interested party should do to improve those chances.

That's correct. The survey wasn't intended to understand respondents' views on interventions. It was only intended to understand: if something goes wrong, what do respondents think that was? Someone could run another survey that asks about interventions (in fact, this other recent survey does that). For the reasons given in the Motivation section of this post, we chose to limit our scope to threat models, rather than interventions.

Survey on AI existential risk scenarios

Thanks for pointing this out. We did intend for cases like this to be included, but I agree that it's unclear if respondents interpreted it that way. We should have clarified this in the survey instructions.

Survey on AI existential risk scenarios

Is one question combining the risk of "too much" AI use and "too little" AI use?

Yes, it is. Combining these cases seems reasonable to me, though we definitely should have clarified this in the survey instructions. They're both cases where humanity could avoided an existential catastrophe by making different decisions with respect to AI.

What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

Thanks a lot for this post, I found it extremely helpful and expect I will refer to it a lot in thinking through different threat models.

I'd be curious to hear how you think the Production Web stories differ from part 1 of Paul's "What failure looks like".

To me, the underlying threat model seems to be basically the same: we deploy AI systems with objectives that look good in the short-run, but when those systems become equally or more capable than humans, their objectives don't generalise "well" (i.e. in ways desirable by human standards), because they're optimising for proxies (namely, a cluster of objectives that could loosely be described as "maximse production" within their industry sector) that eventually come apart from what we actually want ("maximising production" eventually means using up resources critical to human survival but non-critical to machines).

From reading some of the comment threads between you and Paul, it seems like you disagree about where, on the margin, resources should be spent (improving the cooperative capabilities of AI systems and humans vs improving single-single intent alignment) - but you agree on this particular underlying threat model?

It also seems like you emphasise different aspects of these threat models: you emphasise the role of competitive pressures more (but they're also implicit in Paul's story), and Paul emphases failures of intent alignment more (but they're also present in your story) - though this is consistent with having the same underlying threat model?

(Of couse, both you and Paul also have other threat models, e.g. you have Flash War, Paul has part 2 of "What failure looks like", and also Another (outer) alignment failure story, which seems to be basically a more nuanced version of part 1 of "What failure looks like". Here, I'm curious specifically about the two theat models I've picked out.)

(I could have lots of this totally wrong, and would appreciate being corrected if so)

What are some real life Inadequate Equilibria?

I'm a bit confused about the edges of the inadequate equilbrium concept you're interested in.

In particular, do simple cases of negative externalities count? E.g. the econ 101 example of "factory pollutes river" - seems like an instance of (1) and (2) in Eliezer's taxonomy - depending on whether you're thinking of the "decision-maker" as (1) the factory owner (who would lose out personally) or (2) the government (who can't learn the information they need because the pollution is intentionally hidden). But this isn't what I'd typically think of as a bad Nash equilibrium, because (let's suppose) the factory owners wouldn't actually be better off by "cooperating"

What will 2040 probably look like assuming no singularity?

Just an outside view that over the last decades, a number of groups who previously had to suppress their identities/were vilified are now more accepted (e.g., LGBTQ+, feminists, vegans), and I expect this trend to continue.

I'm curious if you expect this trend to change, or maybe we're talking about slightly different things here?

What will 2040 probably look like assuming no singularity?

I had something like "everybody who has to strongly hide part of their identity when living in cities" in mind

Less Realistic Tales of Doom

Thanks for writing this! Here's another, that I'm posting specifically because it's confusing to me.

Value erosion

Takeoff was slow and lots of actors developed AGI around the same time. Intent alignment turned out relatively easy and so lots of actors with different values had access to AGIs that were trying to help them. Our ability to solve coordination problems remained at ~its current level. Nation states, or something like them, still exist, and there is still lots of economic competition between and within them. Sometimes there is military conflict, which destroys some nation states, but it never destroys the world.

The need to compete in these ways limits the extent to which each actor is able to spend their resources on things they actually want (because they have to spend a cut on competing, economically or militarily). Moreover, this cut is ever-increasing, since the actors who don't increase their competitiveness get wiped out. Different groups start spreading to the stars. Human descendants eventually colonise the galaxy, but have to spend ever closer to 100% of their energy on their militaries and producing economically valuable stuff. Those who don't get outcompeted (i.e. destroyed in conflict or dominated in the market) and so lose their most of their ability to get what they want.

Moral: even if we solve intent alignment, avoid catastrophic war or misuse of AI by bad actors, and other acute x-risks, the future could (would probably?) still be much worse than it could be, if we don't also coordinate to stop the value race to the bottom.

What will 2040 probably look like assuming no singularity?

Epistemic effort: I thought about this for 20 minutes and dumped my ideas, before reading others' answers

  • The latest language models are assisting or doing a number of tasks across society in rich countries, e.g.
    • Helping lawyers search and summarise cases, suggest inferences, etc. but human lawyers still make calls at the end of the day
    • Similar for policymaking, consultancy, business strategising etc.
    • Lots of non-truth seeking journalism. All good investigative journalism is still done by humans.
    • Telemarketing and some customer service jobs
  • The latest deep RL models are assisting or doing a number of tasks in across society in rich countries, e.g.
    • Lots of manufacturing
    • Almost all warehouse management
    • Most content filtering on social media
    • Financing decisions made by banks
  • Other predictions
    • it's much easier to communicate with anyone, anywhere, at higher bandwidth (probably thanks to really good VR and internet)
    • the way we consume information has changed a lot (probably also related to VR, and content selection algorithms getting really good)
    • the way we shop has changed a lot (probably again due to content selection algorithms. I'm imagining there being very little effort between having a material desire and spending money to have it fulfilled)
    • education hasn't really changed
    • international travel hasn't really changed
    • discrimination against groups that are marginalised in 2021 has reduced somewhat
    • nuclear energy is even more widespread and much safer
    • getting some psychotherapy or similar is really common (>80% of people)
What will 2040 probably look like assuming no singularity?

Thanks for this, really interesting!

Meta question: when you wrote this list, what did your thought process/strategies look like, and what do you think are the best ways of getting better at this kind of futurism?

More context:

  • One obvious answer to my second question is to get feedback - but the main bottleneck there is that these things won't happen for many years. Getting feedback from others (hence this post, I presume) is a partial remedy, but isn't clearly that helpful (e.g. if everyone's futurism capabilities are limited in the same ways). Maybe you've practised futurism over shorter time horizons a lot? Or you expect that people giving you feedback have?
  • After reading the first few entries, I spent 20 mins writing my own list before reading yours. Some questions/confusions that occurred:
    • All of my ideas ended up with epistemic status "OK, that might happen, but I'd need to spend at least a day researching this to be able to say anything like "probably that'll happen by 2040" "
      • So I'm wondering if you did this/already had the background knowledge, or if I'm wrong that this is necessary
    • My strategies were (1) consider important domains (e.g. military, financial markets, policymaking), and what better LMs/deep RL/DL in general/other emerging tech will do to those domains; (2) consider obvious AI/emerging tech applications (e.g. customer service); (3) look back to 2000 and 1980 and extrapolate apparent trends.
      • How good are these strategies? what other strategies are there? how should they be weighed?
    • How much is my bottleneck to being better at this (a) better models for extrapolating trends in AI capabilities/other emerging tech vs (b) better models of particular domains vs (c) better models of the-world-in-general vs (d) something else?
Load More