Ranking the endgames of AI development

[-]AnthonyC2mo30

I equally hope to write "life in the day of" posts for each category soon as a better visualisation of what each of these worlds entails.

I think this would be really interesting and useful! For me, just reading the flowchart and seeing the list laid out makes me assume most people would seriously underestimate how broad these categories could actually be.

Exact placement would of course involve a number of value judgment calls. For example, I would probably characterize something like the outcome in Friendship is Optimal as an example of #7, but it could also be considered 8/10/11.

I'm also curious about your thoughts on the relative stability of each of these categories. To me, #6 seems metastable at best, for example, while #9 is an event, not a trajectory. AKA it is at least theoretically recoverable to some of the other states (or else declines into 10/11).

[-]Sean Herrington2mo30

Yeah so for sure #9 is a bit of an outlier in terms of stability, my general conception of it was something akin to "a disaster that puts us back in the middle ages". More broadly, I think of a lot of these as states which will dominate the next few centuries/millenium or so, rather than infinitely, and I think that mostly justifies inclusion of "not technically actually fully stable states but will kinda last for a while". I think it would be interesting to do some sort of analysis on how long we would expect e.g an AI dictatorship to last for though.

I think that practically speaking differentiating 7/8/11 when in that world or when planning for the future is probably very hard? Misinformation is gonna be nuts in most of those places, but I felt that the moral outcome was so varied they deserved to be split up.

In terms of #6, I feel like in these worlds you're creating some sort of a simulator based ASI, which sometimes goes Bing Sydney on you and therefore cannot be reasonably called "aligned", but has human enough motivations that it doesn't take over the world? There are presumably minds in mind-space which aren't aligned to humans and also don't want to take over the world, although I admit I don't give this high probability.

[-]StanislavKrym2mo20

Thank you for the effort in categorising the scenarios! I am also interested in learning what could drift mankind from one epilogue to another. And one could also consider where existing high-quality scenarios like the AI 2027 forecast^[1] land on this scale. However, as I detailed in my quick take, the scenarios post-AI 2027 are mostly slop or modifications of the forecast: alternate compute assumptions or attempts to include rogue replication, both of which just change P(mutual race)

^{^}
This includes modifying the Race Ending by making Agent-4 more nice or spelling out is personality and the ways in which it's misaligned, as done by me.

[-]Sean Herrington2mo20

Yeah I think that figuring out how to move probability mass between these scenarios is probably a good next move, although at some point I think I may want to revisit how I've drawn the boundaries - they seem pretty neat atm but I think it fairly likely the future will throw us a curveball at some point.

^{^}

I am aware that Yudkowsky was talking about how an ideal future will probably seem strange to us, but I imagine many non-ideal futures will appear equally odd.

^{^}

At least, most superintelligences which don't blow up like those subject to substrate needs convergence do

^{^}

This is the scenario I am least certain of how to rank. I would be interested in people's thoughts here.

^{^}

Personally I feel the most likely scenario for this is via a superintelligence with behaviour largely predicted by something stemming from Janus's simulator theory, making it more "human" in its actions.

LESSWRONG
LW

LESSWRONG
LW

19

Ranking the endgames of AI development

19

19

Intro

Setting

Relevance

Methods

Assumptions

Results

Ranking