I'm searching for writeups on scenarios of AI takeover. I'm interested in "?????" in 
"1. AGI is created
2. ??????
3. AGI takes over the world"
Yeah, I played the paperclips game. But I'm searching for a serious writeup of possible steps rogue AI can take to take over the world and why(if) it inevitably will.
I've come to understand that it's kinda taken for granted that unaligned foom equals catastrophic event.
I want to understand if it's just me that's missing something, or if the community as a whole is way too focused on alignment "before", whereas it might also be productive to model the "aftermath". Make the world more robust to AI takeover, instead of taking for granted that we're done for if that happens.
And maybe the world is already robust to such takeovers. After all, there are already superintelligent systems out in the world, but none of them have taken it over just yet (I'm talking governments and companies).
To sum it up: I guess I'm searching for something akin to the Simulation Argument by Bostrom, let's call it "the Takeover Argument".

New to LessWrong?

New Answer
New Comment

3 Answers sorted by

In any scenario there will be these two activities undertaken by the DEF ai:

  1. Preparing infrastructure for its initial deployment: ensuring global internet coverage (SpaceX SATs), arranging computing facilities (clouds), creating unfalsifiable memory storages etc.
  2. Making itself invincible: I cherish hope for some elegant solution here, like entangling itself with our financial system. Using Blockchain for them memory banks.

Maybe have a look at the posts tagged AI Risk.

Some things we don't know for sure, like how exactly do the right algorithms and computing power translate to intelligence. Like, after we create the first machine intellectually roughly equivalent to a human with IQ 100, will the following generation of the machine be like IQ 150 or more like IQ 1000? (The latter would be more difficult to control.) How much of the limiting factor is the algorithm, and how much the resources? (The cheaper the resources necessary for running the AI once we have figured out the algorithm, the more easily it can escape from the lab by copying itself over the internet.) When the machine becomes better than humans at writing its own algorithm, how much of a recursive improvement will it be able to achieve?

When we will know the answers to questions like these, some world takeover scenarios may seem quite silly in retrospect. (It is kinda like not knowing whether the first nuclear explosion will ignite the atmosphere, or whether a black hole created in a lab will swallow the Earth. Some people worry, some people don't; if we run the experiment and happen to survive, of course those who didn't worry will say "I told you so".)

there are already superintelligent systems out in the world, but none of them have taken it over just yet (I'm talking governments and companies).

The governments have not taken over the world yet? I guess I don't need to do my tax report this year. Maybe need to move a few miles away, to some of those many places that are not controlled by the governments. /s

I suppose you meant that no individual government has taken over the whole world, yet. True, but some of them have large territories and quite a lot of influence even beyond their borders.

The weakness of governments and companies is that they are composed of many people, each following their own goals, often contradictory, each human limited to one brain and 24 hours a day. A machine superintelligence (probably) would not be so divided internally. It could pay attention to thousand things at the same time, without splitting into thousand mutually competing and backstabbing sub-agents. That kind of power does not exist currently; not even in dictatorships or criminal gangs.

If we model the AI as a hostile entity that directly tries to take over the world, a possible scenario would be that it finds a way to communicate by internet, and contacts thousands of people in parallel. It could obtain money by providing online services to people, or scamming people, or hacking bank software. It could use the money to buy hardware where it would upload its copies. (As soon as it copied itself out of the lab, preferable in multiple copies, it could leave behind a crippled well-behaving version. So if the scientists did not notice anything suspicious before that, no more suspicious activity would occur in the lab afterwards.) Where humans require interaction with another human, the AI could simulate audio and video in an online call, and hire a human to act as its agent where physical presence is needed. (The hired humans would believe they are working remotely for a human boss in a different country.) From then on, the AI could try thousand different plans in parallel: make friends with many people using a fake online identity, infiltrate many kinds of organizations, including political, military, and criminal groups, playing both sides of every conflict. Creating more copies of itself all over the world. (At this moment, even if humans find out and start fighting back, they can never be sure they destroyed all copies.) By the time the AI would rule the world, most humans would still believe they are ruled by other humans. But of course, it would also try to hack all existing systems.

This scenario is still extremely anthropomorphic -- imagining what strategy I would use to conquer the world, if I got the AI superpowers, or really just one of them: creating many copies aligned with each other. (A superintelligence would almost by definition be able to create better plans.) But there a completely different scenario described here, about an AI that makes itself too useful to be turned off, and then just fails in ways that are hard to describe.

2 comments, sorted by Click to highlight new comments since: Today at 6:22 AM

I'm working on a thorough answer to your question. It's sorta what I've been trying to do for the past year. See my sequence on takeoff & takeover. Sorry it's not complete yet! But slowly I'm getting there. In particular, my "What 2026 looks like" will eventually be extended with additional chapters.

Thanks for your work! I’ll be following it.