Aligned AGI is a large scale engineering task

Humans have never completed at large scale engineering task without at least one mistake

An AGI that has at least one mistake in its alignment model will be unaligned 

Given enough time, an unaligned AGI will perform an action that will negatively impact human survival

Humans wish to survive

Therefore, humans ought not to make an AGI until one of the above premises changes.


This is another concise argument around AI x-risk. It is not perfect. What flaw in this argument do you consider the most important?

New Comment
14 comments, sorted by Click to highlight new comments since: Today at 10:47 AM
[-]1a3orn9mo1810

So, if I were to point out what seems like the most dubious premise of the argument, it would be "An AGI that has at least one mistake in its alignment model will be unaligned."

Deep Learning is notorious for continuing to work while there are mistakes in it; you can accidentally leave all kinds of things out and it still works just fine. There are of course arguments that value is fragile and that if we get it 0.1% wrong then we lose 99.9% of all value in the universe, just as there are arguments that the aforementioned arguments are quite wrong. But "one mistake" = "failure" is not a firm principle in other areas of engineering, so it's unlikely to be the case.

But, apart from whether individual premises are true or false -- in general, it might help if, instead of asking yourself, "What kind of arguments can I make for or against AI doom?" you instead asked "hey, what would AI doom predict about the world, and are those predictions coming true?"

It's a feature of human cognition that we can make what feel like good arguments for anything. The history of human thought before the scientific method is people making lots of good arguments for or against God's existence, for or against the divine right of kings, and so on, and never finding anything new out about the world. But the world changed when some people gave up on that project, and realized they needed to look for predictions.. This changed the world; I really, really recommend the prior article.

Nullius in verba is good motto.

Thank you for your reply 1a3orn, I will have a read over some of the links you posted.

Deep Learning is notorious for continuing to work while there are mistakes in it; you can accidentally leave all kinds of things out and it still works just fine. There are of course arguments that value is fragile and that if we get it 0.1% wrong then we lose 99.9% of all value in the universe, just as there are arguments that the aforementioned arguments are quite wrong. But "one mistake" = "failure" is not a firm principle in other areas of engineering, so it's unlikely to be the case.

Ok the "An AGI that has at least one mistake in its alignment model will be unaligned" premise seems like the weakest one. Is there any agreement in the AI community about how much alignment is "enough"? I suppose it depends on the AI capabilities and how long you want to have it running for. Are there any estimates?

For example if humanity wanted a 50% chance of surviving another 10 years in the presence of a meta-stably-aligned ASI, the ASI would need a daily non-failure rate of x^(365.25*10)=0.5, x=0.99981 or better.

It's a feature of human cognition that we can make what feel like good arguments for anything.

I would tend to agree, the confirmation bias is profound in humans.

you instead asked "hey, what would AI doom predict about the world, and are those predictions coming true?"

Are suggesting a process such as: assume a future -> predict what would be required for this -> compare prediction with reality.
Rather than: observe reality -> draw conclusions -> predict the future

Humans have never completed at large scale engineering task without at least one mistake

This line does not seem sufficiently well-defined.

Both "launch a satellite" and "launch the specific satellite" are large scale engineering tasks (or were, in the previous century); the first one had some mistakes, and in most cases the second one had no mistakes.

Transferring the argument to AI: the mistake may happen and be fixed while the task is not "create aligned AGI" but some prerequisite one, so it doesn't ensure that the final AGI is unaligned.

OK I take your point. In your opinion would this be an improvement "Humans have never completed at large scale engineering task without at least one mistake on the first attempt"?

For the argument with AI, will the process that is used to make current AI scale to AGI level? From what I understand that is not the case. Is that predicted to change?

Thank you for giving feedback. 

This is in fact Eliezer's view. An aligned AGI is a very small target to hit in the space of possible minds. If we get it wrong, all die. So we only get one go at it. At present we do not even know how to do it at all, never mind get it right on the first try.

Yea, I have to say many of my opinions on AI are from Eliezer. I have read much of his work and compared it to the other expects I have read about and talked with, and I have to say, he seems to understand the problem very well.

I agree, aligned AGI seems like a very small island in the sea of possibly. If we have multiple tries at getting it right (for example with AGI in a perfectly secure simulation), I think we have a chance. But with only effectively one try, the probability of success seems astronomically low.

[-]Ilio9mo20

It’s a [precautionary principle]/(https://en.m.wikipedia.org/wiki/Precautionary_principle#Criticisms), so the main flaw is: it fails to balance risks with benefits.

For example from wikipedia, forbidding nuclear power plants based on concerns about low-probability high-impact risks means continuing to rely on power plants that burn fossil fuels. In the same vein, future AGIs would most likely help with many existential risks like detecting rogue asteroids and improving the economy enough that we don’t let a few million human children die from starvation each year.

Okay, how much risk is worth the benefit? Would you advocate for a comparison of expected gains and expected losses?

[-]Ilio9mo10

You mean: how to balance a low-probability x-risk with a high-probability of saving a large number/small fraction of human children? Good point it’s hard, but we don’t actually need this orange-apple comparison: the point is that AGI may well decrease overall x-risks.

(I mentioned starving children because some count large impacts as x-risk, but on second thought it was probably a mistake)

Which x-risks do you think AI will reduce? I have heard arguments that it would improve our ability to respond to potential asteroid impacts. However this reduction in x-risk seems very small in comparison to the x-risk that unaligned AGI poses. What makes you estimate that AI may reduce x-risk?

[-]Ilio9mo10

Which x-risks do you think AI will reduce?

First, the most important ones: those we don’t know about yet, but have a better chance to fight using either increased wisdom (from living hundreds of years or more), practically unlimited skilled labor, guaranteed reproductible decisions, or any combination of that plus all the fruits from the scientific revolutions that will follow.

Second, the usual boring ones: runaway global warming, pathogens with kuru-like properties, collapse of governance shifting threats from endurable to existential, etc.

Third, the long term mandatory need of conquering the stars, which sounds much easier using robots followed by photons for uploading our minds.

Finally, and iff such concepts are actually valid (I’m not sure), reproducible AGI will help us becoming AGI+, which might be necessary to align ourselves as AGI++, and so on.

What makes you estimate that AI may reduce x-risk?

I don’t get the logic here. Once you agree there’s at least one x-risk AGI may reduce, isn’t that enough to answer both the OP and your last question? Maybe you meant: « What makes you estimate that AI may reduce x-risk more than EY’s estimate of how much it’d increase it? ». In which case I don’t, but that’s just a property of EY’s estimate being maximally high.