The Alignment problem appears extremely concerning, so here's my attempt to explain the most likely way we could live in an aligned future.

The Perfect Tool AI

Imagine a Tool AI, something like DALLE-2 or GPT-3. It's a simple AI relatively speaking, and it does one of three things exceptionally well:

-A Perfect Creator. Manufactures and controls nanotechnology to rearrange atoms in a given space. If you tell this tool to destroy an area, it will do so by using self-replicating nanotechnology to disassemble that area into basically whatever you want. You would basically have a perfect, unstoppable factory that can produce anything from anything. In this case, you can tell the AI to disassemble the components of every AI lab on earth, create human-life extending technologies, and you could live as an immortal god over humanity as you slowly put together a team of AI Alignment researchers to build an aligned AI over the centuries. 

Likelihood of creation: Very low. Nanotech is a hard problem to solve, and controlling that nanotech is just as difficult.

-A Perfect (Computer) Virus. Imagine a worm like StuxNet, but it contains a simple script that connects it to a higher AI controller once it takes over a computer, and then creates new controllers once it has sufficient compute power. With each computer it infects, it gets smarter and more powerful, and most importantly, better at infecting computers. Human security teams cannot keep up. The Internet is basically an ecosystem, the same way Earth was before the explosion of cyanobacteria, and it is ripe for the taking by an advanced AI. Whoever is behind this virus can now control every computer on Earth. If they're focused on AI Alignment, then they can easily destroy any computers connected to an AI Lab and slowly assemble a team of alignment engineers to solve AGI Alignment as well as allowing compute power for other tasks like life-extension technology. 

Likelihood: Medium. I am honestly surprised we haven't seen AI powered security breaches. It seems feasible.

-A Perfect Manipulator. GPT-3 is already surprisingly convincing, but what if you trained an AI to be perfectly convincing? I have seen incredibly charismatic people change another person's political beliefs over the span of a conversation. There is some string of text that could likely convince you of just about anything, and if an AI gets good enough at that, you might literally be able to convince the whole world of the importance of AI Alignment.

Likelihood: Medium. It should be fairly simple, even if you were to train it to create highly-upvoted posts and then make the topic Alignment. I'm sure GPT-3 could make a thousand variations of this post and spread it across the internet, and another AI on top of that to find increasingly persuasive arguments should also be feasible.


If Eliezer is correct and AGI Alignment is nearly unsolvable in our timeframe, then we're basically just going to have to move the alignment problem to a human with access to godlike technology and hope that they cripple any chances of creating other godlike Tool AIs.

Either way, awareness is the most critical piece of the puzzle at this juncture. Start posting about AI alignment. We have such a tiny fraction of the human population thinking about this problem, so we need more minds.

I'd like feedback on whether a single human operator with access to godlike AI would work to delay the advent of AGI until Alignment becomes possible. Could this be a solution, even if it is farfetched? After thinking extensively, it's the only solution I can come up with. At the very least, I do not see what could make it unworkable. Even if absolute power corrupts (a human operator), I do not believe it will corrupt absolutely (resulting in the destruction of the human race).

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 10:02 AM

These three proposals are basically examples of the "minimal pivotal act" specialized AI that Yudkowsky talks about and thinks is ~impossible for us to build safely. I personally think value alignment is way easier to solve than most pessimists assume, and that these attempts to circumvent the core of value alignment are (1) not going to work (as in, you'll not be able to build a specialized system with only the capabilities required for your pivotal act before someone else builds a generalist ASI too strong for you pivotal act specialist to stop), and (2) just straight up more dangerous than trying to build a value-aligned AGI. 

The reason for the latter is that you need value alignment anyways in order to prevent your pivotal act specialist from killing you (or turning into a thing that would kill you), and you're more likely to solve value alignment if you actually set out to solve value alignment as the main thing you're doing.

Good points. Your point about value alignment being better to solve then just trying to orchestrate a pivotal act is true, but if we don't have alignment solved by the time AGI rolls around, then from a pure survival perspective, it might be better to try a narrow ASI pivotal act instead of hoping that AGI turns out to be aligned already. This solution above doesn't solve alignment in the traditional sense, it just pushes the AGI timeline back hopefully enough to solving alignment.

The idea I have specifically is that you have something like GPT-3 (unintelligent in all other domains, doesn't expand outside of its system or optimize outside of itself) that becomes an incredibly effective Tool AI. GPT-3 isn't really aligned in the Yudkowsky sense, but I'm sure you could get it to write a mildly persuasive piece already. (It sort of already has: https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3 ).

Scale this to superintelligent levels like AlphaGo, and I think you could orchestrate a pivotal act pretty rapidly. It doesn't solve the alignment problem, but it pushes it back.

The problems are that the user needs to be aligned and that the type of narrow ASI has to be developed before AGI. But given the state of narrow ASI, I think it might be one of the best shots, and I do think a narrow ASI could get to this level before AGI, much the same way as AlphaGo proceeded MuZero.

What I am ultimately saying is that if we get a narrow AI that has the power to make a pivotal act, we should probably use it.

In all three cases, the AI you’re asking for is a superintelligent AGI. Each has to navigate a broad array of physically instantiated problems requiring coherent, goal oriented optimisation. No stateless, unembedded and temporally incoherent system like GPT-3 is going to be able to create nanotechnology, beat all human computer security experts, or convince everyone of your position.

Values arise to guide the actions that intelligence systems perform. Evolution did not arrange for us to form values because it liked human values. It did so because forming values is an effective strategy for getting more performance out of an agentic system, and SGD can figure this fact out just as easily as evolution.

If you optimise a system to be coherent and take actions in the real world, it will end up with values oriented around doing so effectively. Nature abhors a vacuum. If you don’t populate your superintelligent AGI with human-compatible values, some other values will arise and consume the free energy you’ve left around.

Interesting! I appreciate the details here; it gives me a better sense of why narrow ASI is probably not something that can exist. Is there a place we could talk over audio about AGI alignment versus text here on LessWrong? I'd like to get a better idea of the field, especially as I move into work like creating an AI Alignment Sandbox.

My Discord is Soareverix#7614 and my email is maarocket@gmail.com. I'd really appreciate the chance to talk with you over audio before I begin working on sharing alignment info and coming up with my own methods for solving the problem.