Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

TL;DR: We wrote a post on possible success stories of a transition to TAI to better understand which factors causally reduce the risk of AI risk. Furthermore, we separately explain these catalysts for success in more detail and this post can thus be thought of as a high-level overview of different AI governance strategies.  


Thinking through scenarios where TAI goes well informs our goals regarding AI safety and leads to concrete action plans. Thus, in this post,

  • We sketch stories where the development and deployment of transformative AI go well. We broadly cluster them like
    1. Alignment won’t be a problem, …
      • Because alignment is easy: Scenario 1
      • We get lucky with the first AI: Scenario 4
    2. Alignment is hard, but …
      • We can solve it together, because …
        • We can effectively deploy governance and technical strategies in combination together: Scenario 2
        • Humanity will wake up due to an accident: Scenario 3
        • The US and China will realize their shared interests: Scenario 5
      • One player can win the race, by …
        • Launching an Apollo Project for AI: Scenario 6
  • We categorize central points of influence that seem relevant for causing the success of our sketches. The categories with some examples are:
    1. Governance: domestic laws, international treaties, safety regulations, whistleblower protection, auditing firms, compute governance and contingency plans
    2. Technical: Red teaming, benchmarks, fire alarms, forecasting and information security
    3. Societal: Norms in AI, publicity and field-building
  • We lay out some central causal variables for our stories in the third chapter. They include the level of cooperation, AI timelines, take-off speeds, size of the alignment tax, type of actors and number of actors


Ω 8

4 comments, sorted by Click to highlight new comments since: Today at 3:54 AM
New Comment

What's a TAI? There is no definition of this acronym anywhere in this post or in the link, and google brings 3 different but apparently unrelated hits: threats in AI, IEEE Transactions on AI, and... Tentacular AI. I hope it's that last one.

I think usually Transformative AI.

Thanks :) 

My mainline best case or median-optimistic scenario is basically partially number 1, where aligning AI is somewhat easier than today, plus acceleration of transhumanism and a multipolar world both dissolve boundaries between species and the human-AI divide, this by the end of the Singularity things are extremely weird and deaths are in the millions or tens of millions due to wars.

New to LessWrong?