Epistemic Status: Ravings

Importance: Easily the most important problem in the world. If we can escape the proof of doom, we will likely also solve all our other problems as a side effect, and all that remains will be the fundamental limits. Our new questions will be things like "How much joy can the universe physically support?"

-

It seems to me that the world, and everyone in it, is doomed, and that the end is considerably nigher than I might like.

To be more specific, I think that we may well create an Artificial General Intelligence within around 20 years time, and that that will be our last act. 

The newly-created AGI will immediately kill everyone on the planet, and proceed to the destruction of the universe. Its sphere of destruction will expand at light speed, eventually encompassing everything reachable.

There may well be more proximal threats to our species. Comets are one obvious one, but they seem very unlikely. Artificially created universally fatal plagues are another, but perhaps not very likely to happen within the next twenty years.

--

I've believed this for many years now, although my timescales were longer originally, but it seems to me that this is now becoming a common belief amongst those who have thought about the problem. 

In fact, if not consensus, then at least the majority opinion amongst those mathematicians, computer scientists, and AI researchers who have given the subject more than a few days thought.

Those who once were optimistic seem pessimistic now. Those who were once dismissive seem optimistic.

But it is far from being even a mainstream opinion amongst those who might understand the arguments.

Far, even, from rising to the level of a possible concern.

Amongst my personal friends, amongst people who would mostly take my word on technical and scientific issues, I have found it impossible to communicate my fears.

Not all those who are capable of pressing the suicide button understand that there is a suicide button.

--

What to do?

One might empathise with Cassandra. A vision of flame, and no-one will believe.

Cassandra had many opportunities to save her city, but the curse of Apollo rendered her unable to communicate with her fellow citizens. Those of her cohort who independently sensed danger were individually obstructed by the gods.

We operate under no such constraints.

Our arguments are not clear.

My attempts to communicate the danger involve a complex argument with a series of intuitive leaps. 

Any given interlocutor will balk at a particular hurdle, and write off the entire argument.

--

Consider a toy example: 

Until very recently I did not understand Fermat's Christmas Theorem (https://thatsmaths.com/2014/12/25/fermats-christmas-theorem)

I considered it one of those tedious facts that number theorists always seem fond of. 

I think if someone had shown me the 'One Sentence Proof': (https://en.wikipedia.org/wiki/Fermat%27s_theorem_on_sums_of_two_squares#Zagier's_%22one-sentence_proof%22)

then I might have been able to understand it. With a lot of effort. 

But I am reasonably certain that I wouldn't have put the effort in, because it wouldn't have seemed worth the trouble.

Just before Christmas, a friend encountered in a pub showed me a couple of examples of the 'windmill argument', which makes the central idea of that proof visual.

We couldn't actually finish the proof in the pub, but the central idea nerd-sniped me, and so I played with it for a couple of days.

And at the end of it, I was convinced of the truth and of the beauty of the theorem.

I now think that I could explain it to a bright ten-year old, if that ten-year old was curious enough to play with pictures for an hour.

I'm considering writing an xscreensaver hack to illustrate it.

I still don't care about the result itself. But the beauty of the proof makes it a result that sparks joy. 

--

That's what a proof is. Not a vague collection of intuitions. Not an obscure collection of symbols and formal manipulations.

A proof is, quite simply, an argument strong enough to convince a listener of its conclusion.

We need a Proof of Doom.

--

The proof must live, must be unanswerable. Must be clear.

Must be simple enough to convince anyone capable of bringing about the apocalypse that there is an apocalypse to be brought about.

A version full of greek letters would be nice to have in addition, such things tend to be more amenable to automatic verification.

But what we need is something that will convince a human mind without too much trouble. Every step must be interesting, must be compelling. Must be clear.

And I may be wrong. Perhaps the fact that almost no-one agrees with me and that I can't convince anyone is a sign that I am wrong. It has happened before. Maybe my argument is not sound. Maybe it is mostly sound, but there are loopholes.

Attempting to prove the truth of an idea is often a good way of showing that the idea is false.

By the Father and the Bright-Eyed Girl, would that this idea were false.

New to LessWrong?

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 6:40 AM

Well, "existence proof" is one possibility - actual doom is incontrovertible proof.  That's probably too late for your purposes, though.  

But really, ANY proof is too late, or it wouldn't be a proof.  Once you formally prove something, it's true (or the proof is wrong).  You don't want it to be true.

I think what you're looking for is a compelling conditional argument of the form "if you do X, then doom won't obtain".  This is much harder than a proof.

Indeed, I don't want a formally-verified watertight Proof of Doom. I'd actually be a little surprised if we were doomed to mathematical standards.

I want a viral memetic Proof of Doom. A compelling argument that will convince everyone clever enough to be dangerous. The sort of thing that a marketing AI might come up with if it were constrained to only speak the truth. The sort of thing that might start a religion. A religion with a prohibition against creating God.

We have trained ML systems to play games, what if we trained one to play a simplified version of the "I'm an AI in human society" game?

Have a population of agents with preferences, the AI is given some poorly specified goal, it has the ability to expand its capabilities etc. You might expect to observe things like a "treacherous turn".

If we could do that it would be quite the scary headline "Researchers simulate the future with AI and it kills us all". Not proof, but perhaps viral and persuasive.

This would not be a conclusive test, but definitely a cool one and may spark a lot of research. Perhaps we could get started with something NLP based, opening up more and more knowledge access to the AI in the form of training data. Probably still not feasible as of 2022 in term of raw compute required.

The board game 'Diplomacy' comes to mind. I wonder if anyone's ever tried to get AIs to play it? 

Certainly there've been a lot of multi-agent prisoners dilemma tournaments. I think MIRI even managed to get agents to cooperate in one-shot prisoners dilemma games, as long as they could examine each other's source code.

You can't get such proof. A) the future is complex and hard to predict, b) there are even many plausible scenarios (not much likely, but plausible) where it doesn't happen.

The best argument we can come up with is that it would be reasonable to assume a high probability (say 70-90%) that it will happen through x and y, and that probability is definitely too high for it not to become the most important thing of today.

Imo the best route is: Alpha Code can already compete with human programmers in some areas. Will it take long before AI becomes better than humans at programming, and can therefore take care of AI development by itself and rewrite its own code?

I've found that people often don't agree with AI x-risk because they haven't been exposed to the key concepts like recursive self improvement and basic AI drives. I believe that once you understand those concepts it's hard not to change your mind, because they paint such a clear picture.

Indeed, I don't want a formally-verified watertight Proof of Doom. I'd actually be a little surprised if we were doomed to mathematical standards.

I want a viral memetic Proof of Doom. A compelling argument that will convince everyone clever enough to be dangerous. The sort of thing that a marketing AI might come up with if it were constrained to only speak the truth. The sort of thing that might start a religion. A religion with a prohibition against creating God.

The newly-created AGI will immediately kill everyone on the planet, and proceed to the destruction of the universe. Its sphere of destruction will expand at light speed, eventually encompassing everything reachable.

Why?

In fact, if not consensus, then at least the majority opinion amongst those mathematicians, computer scientists, and AI researchers who have given the subject more than a few days thought.

Is this true, or you have asked only inside an AI-pessimistic bubble? 

And if True, why should opinions matter at all? Opinions cannot influence reality which is outside human control.

Overall I don't see a clear argument about why should we worried about AGI. Quite the contrary, building AGI is still an active area of research with no clear solution.

"Overall I don't see a clear argument about why should we worried about AGI."

I see many, perhaps the most simple and most convincing is the mere question of "what happens when AI becomes better than humans at coding?", aka it can re-write its own code, aka it becomes totally unpredictable.

"Quite the contrary, building AGI is still an active area of research with no clear solution."

There was also no clear solution to many other transformative technologies a couple years before their invention, from atomic bombs to airplanes. With both these examples, many, if not most specialists also said they were impossible.

Either the world wakes up, or we're screwed. No one can now for sure whether AGI will come soon or even if it's possible. But the high probability of it should make anyone scared. And even higher is the probability that if AI progress continues we'll end up with dangerous AI, not necessarily AGI. Aka something difficult or impossible to control.

even if it's possible

 

Oh no, we know it's possible. We ourselves prove by our existence that a generally capable reasoning agent can be constructed out of atoms.

Worse, we know that it's easy. Evolution did it by massive trial and error with no intelligence to guide it. 

We are so much cleverer than evolution. And so very far off how intelligent it is possible to be.

Is this true, or you have asked only inside an AI-pessimistic bubble? 

 

I've no idea whether it's true, but neither have I only asked inside the bubble. 

I have a habit of bringing the subject up whenever I meet someone 'in the trade'. The days of lively and enjoyable argument seem over. Either people are dismissive, and say that they don't want to discuss philosophical issues, and AIs are not dangerous, or they are concerned but not fatalistic, or they are doomers like me.

 The newly-created AGI will immediately kill everyone on the planet, and proceed to the destruction of the universe. Its sphere of destruction will expand at light speed, eventually encompassing everything reachable.

Why?

Well quite! This is my strong intuition but I find it hard to convince anyone.

I might say: "Because that is what I would do if there was something I wanted to protect." 

Imagine you're a human being with a child, and you can snap your fingers to kill all the disease-causing viruses in the world, and all the plague bacteria, and all the nasty worms that eat children's eyes, and all that sort of thing. Wouldn't you?

And what if you could snap your fingers again and make all the paedos and child-murderers go away into a different universe where they will be nice and safe and it's very comfy and well-appointed but they will never come near your child again? Wouldn't you?

And if you could snap your fingers a third time, and make it so that no car would ever strike your little one, wouldn't you?

And so on and so forth, until you run out of immediate threats.

And once all the immediate dangers are dealt with and you can relax a bit, you might start thinking: "Well we're not really safe yet, maybe there are aliens out there. Maybe there are rogue AIs, maybe there are people out there building devices to experiment with the fundamental forces, who might cause a vacuum collapse. Better start exploring and building some defenses and so on."

And I think that that sort of thinking is probably a good model for what is going on inside a really good reinforcement learning agent. 

"What should I do to get the best possible outcome for certain?". 

This is a possible AGI scenario, but it's not clear why it should be particularly likely. For instance the AGI may reason that going aggressive will also be the fastest route to be terminated. Or the AGI may consider that keeping humans alive is good, since they were responsable for the AGI creation in the first place. 

What you describe is the paper-clip maximiser scenario, which is arguably the most extreme end of the spectrum of super-AGI behaviours.

For instance the AGI may reason that going aggressive will also be the fastest route to be terminated

 

Absolutely! It may want to go aggressive, but reason that its best plan is to play nice until it can get into a position of strength.

What you describe is the paper-clip maximiser scenario, which is arguably the most extreme end of the spectrum of super-AGI behaviours.

So, in a sense, all rational agents are paperclip maximisers. Even the hoped-for 'friendly AI' is trying to get the most it can of what it wants,  its just that what it wants is also what we want.

The striking thing about a paperclipper in particular is the simplicity of what it wants. But even an agent that has complex desires is in some sense trying to get the best score it can, as surely as it can.

Wouldn't the Proof of Doom require a deterministic or a low-level probabilistic universe, in which the best you can do is observe what's going on, without any ability to change things? Sort of like being inside the event horizon.

I don't think so. A proof conditional on not taking any significantly different actions would be fine for this purpose.

Not if I interpret you rightly. A man falling from an aeroplane into a field of whirling blades covered in poison is doomed, even though there are many actions he might take on the way down.

A man playing Death at chess is doomed, even though all the same actions are available to him as to his opponent.

[-]TAG2y20

Perhaps the fact that almost no-one agrees with me and that I can’t convince anyone is a sign that I am wrong.

Take a strong upvote!