https://manifold.markets/tailcalled/will-ai-xrisk-seem-to-be-handled-se?r=dGFpbGNhbGxlZA

Lately it seems like there's been a bunch of people taking big AI risks seriously, from OpenAI's Governance of superintelligence to Deepmind's Model evaluation for extreme risks.

We're not quite there yet in terms of my standards for safety, and even less there in terms of e.g. Eliezer Yudkowsky's. However I wonder if this marks a turning point.

Obviously this is a very subjective question, so I am warning you ahead of time that it is going to resolve in opinionated ways. Trying to anchor the discussion, I expect the following to be necessary for a YES resolution:

  • Major leading AI companies openly acknowledge that existential risk is a possibility, not just in a marginal sense (e.g. internal discussion of it by employees, rare cases where the leaders begrudgingly admit it) but also in a central sense (e.g. openly having research teams working on it, having long sections of documents for politicians discussing it).
  • Something is done to handle unilateral actors, e.g. there is active progress made in creating an international organization which can prevent unilateral actors from creating unfriendly AI, or maybe somehow the only plausible creators all take AI xrisk seriously.
  • Yann LeCun changes his mind to take AI xrisk seriously or no longer holds much sway about it at Facebook.
  • The lessons of Worlds Where Iterative Design Fails are taken seriously by the above systems.

Please ask me more questions in the comments to help cement the resolution criteria. If my opinion on the inherent danger of AI xrisk changes during the resolution period, I will try to respond based on the level of risk implied by my criteria, not based on my later evaluation of things.

However, if it turns out that there is a similarly powerful way of handling AI xrisk which is qualitatively different, which gets implemented in practice, I will also resolve this question to YES.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 11:24 PM

Would "we get strong evidence that we're not in one of the worlds where iterative design is guaranteed to fail, and it looks like the group's doing the iterative design are proceeding with sufficient caution" qualify as a YES?

No, by the "If my opinion on the inherent danger of AI xrisk changes during the resolution period, I will try to respond based on the level of risk implied by my criteria, not based on my later evaluation of things." rule, but maybe in such a case I would change the title to reflect the relevant criteria.