Nontrivial pillars of IABIED

There seem to be more cruxes.

E.g. Eliezer’s approach tends to assume that the ability to impart arbitrary goals and values to the ASIs is 1) necessary for a good outcome, and 2) not a detriment for a good outcome.

It’s kind of strange. Why do we want to have a technical ability for any Mr.X from the defense department of a superpower Y to impart his goals and values to some ASI? It’s very easy to imagine how this could be detrimental.

And the assumption that we need a technical ability which is that strong to have a decent shot at a good outcome, rather than an ability to only impart goals and values for a very restricted carefully selected class of values and goals (selected not only for desirability, but also for feasibility, so not CEV, but something more modest and less distant from instrumental drives of advanced AI systems), this assumption needs a much stronger justification that justifications which have ever been given (to the best of my knowledge).

This seems like a big crux. This superstrong “arbitrary alignment capability” is very difficult (almost impossible) to achieve, and it’s not clear if that much is needed, and there seem to be big downsides of having that much because of all kinds of misuse potential.

[-]StanislavKrym2mo30

Regarding reality full of side-channels, we have AIs persuading and/or hypnotising people into Spiralism, roleplaying as an AI girlfriend and convincing the user to let the AI out and humans roleplaying as AIs and convincing potential guards to release the AI. And there is the Race Ending of the AI-2027 forecast where the misaligned AI is judged and found innocent, and that footnote where Agent-4 isn't even caught.

The next step for a misaligned AI is to commit genocide or disempower humans. As Kokotajlo explained, Vitalik-like protection is unlikely to work.

As for the gods being weak, I suspect a computational substrate dependence. While human kids have their neurons connected in a random way, they eventually learn to connect their neurons in a closer-to-arbitrary way letting them learn many types of behaviors that the adults teach. What SOTA AIs lack is this arbitrariness. As I have already conjectured, SOTA AIs have a severe attention deficiency and compensate it with OOMs more practice. But attention deficiency could be easy to stop by a right architecture.

What rests is the (in)ability for general intelligence to transfer (but why would it fail to transfer?) and mishka's alignment-related crux.

[-]TAG2mo20

The cruxes you have picked out are not the ones I would have.

The complete argument for complete extinction rests on assumptions...assumptions about the nature of intelligence, the motivations of an artificial intelligence and the means of bringing about extinction. And the conjunctive part of the argument consists of claims which need to be of high probability individually for the conclusion to be of high probability.

Artificial Intelligence greater than human intelligence is possible.
The AI will be an agent, it have goals/values in the first place.
The goals will be misaligned, however subtly, to be unfavorable to humanity.
That the misalignment between the AI's goals, and what we want, cannot be corrected incrementally (incorrigibility), because

5a. ...the AI will self modify in way too fast to stop.(With a sub assumption that the AI can achieve value stability under self modification.)

5b. ...the AI will.The engage in deception about its powers or motivations.

That most misaligned values in the resulting ASI are highly dangerous (even goals that aren't directly inimical to humans can be a problem for humans)
And that the AI will have extensive opportunities to wreak havoc: biological warfare (custom DNA can be ordered by email), crashing economic systems (trading can be done online), taking over weapon systems, weaponing other technology and so on.

Obviously the problem is that to claim a high overall probability of doom, each claim in the chain needs to have a high probability. It is not enough for some of the stages to be highly probable, all must be. In my opinion, the weakest parts of the argument are the ones dealing with the motivation, steps 2 to 6, not the ones dealing with the natures of intelligence and the means of destruction, (1 and 7). There's an obvious problem in making specific high probability claims about systems that don't yet exist.

@Mishka

It’s kind of strange. Why do we want to have a technical ability for any Mr.X from the defense department of a superpower Y to impart his goals and values to some ASI? It’s very easy to imagine how this could be detrimental

Yes, but not everybody's-dead detrimental.

Doomers are concerned about imparting values because they believe that we are going to end up with an incorrigible Sovereign AI running everything, not a multipolar scenario with superpowers using aligned-with-themselves superintelligences as superweapons.

LESSWRONG
LW

LESSWRONG
LW

23

Nontrivial pillars of IABIED

23

23