Wiki Contributions


I wouldn't call my confidence in doom near-absolute, so much as "very high"! I would have been just as much a doomer in 1950, last time AI looked imminent, before it was realized that "the hard things are easy and the easy things are hard". 

I wouldn't be that surprised if it turned out that we're still a few fundamental discoveries away from AGI. My intuition is telling me that we're not. 

But the feeling that we might get away with it is only coming from a sense that I can easily be wrong about stuff. I would feel the same if I'd been transported back to 1600, made myself a telescope, and observed a comet heading for earth, but no-one would listen.

"Within my model", as it were, yes, near-absolute is a fair description. 

The long-term problem is that an agent is going to have a goal. And most goals kill us. We get to make exactly one wish, and that wish will come true whether we want it or not. Even if the world was sane, this would be a very very dangerous situation. I would want to see very strong mathematical proof that such a thing was safe before trying it, and I'd still expect it to kill everyone.

The short term problem is that we're not even trying. People all over the place are actively building more and more general agents that make plans, with just any old goals, without apparently worrying about it, and they don't believe there's a problem.

What on earth do you think might stop the apocalypse? I can imagine something like "take over the world, destroy all computers" might work, but that doesn't look feasible without superintelligent help, and that puts us in the situation where we have a rough idea what we want, but we still need to find out how to express that formally without it leading to the destruction of all things. 

As a very wise man once said: "The only genie to which it is safe to make a wish is one to which you don't need to make a wish, because it already knows what you want and it is on your side." 

Hi, so I don't understand why you're not worried except that "some clever people don't seem worried".

But actually I think all those guys are in fact quite worried. If they aren't full on doomers then I don't understand what they're hoping to do.

So I'll repeat my argument:

(1) We're about to create a superintelligence. This is close and there's no way to stop it.

(2) If we create a superintelligence, then whatever it wants is what is going to happen.

(3) If that's not what we want, that's very bad.

(4) We have no idea what we want, not even roughly, let alone in the sense of formal specification.

That's pretty much it. Which bit do you disagree with?

That was kind of a long-term source of hopelessness; why I thought Eliezer's plan wouldn't work out without having a very long time and lots of people working on it, but my current source of short-term hopelessness is that it looks like we're right on the verge of achieving AGI, and no-one seems to be taking the danger remotely seriously. 

It's like being in a petrol warehouse with a load of monkeys striking matches. We just die by default now, unless something really drastic and surprising happens.

Do you believe that you are grasping something that seems objective to you on an intellectual and/or conceptual level that others (“others” being people doing research that is at least remotely relevant to alignment or knowledgeable people in the EA/LW/Rat-and-rat-adjacent communities who are more optimistic than you) are failing to grasp, and therefore not availing them the “truth” that alignment is so inhumanly difficult?


Yes I think so, It seems to me that 'saying what the good is' has been a two-thousand year philosophical project on which we've made very little progress. Getting that defined formally, within the next few years, to the point where I might be able to write a computer program to tell me which possible outcomes are good just looks like an impossible task. 

E.g. We all think that whether a being is conscious makes some moral difference. But we aren't even close to being able to tell whether a being is conscious in that sense. I've never heard anyone give a sensible description of what the 'hard problem' even is. That's one of the hard things about it. 

And our formal definition of 'the good' needs to be correct. A few weird edge cases failing under heavy optimization pressure just leads to a paperclipper with weird paperclips that are some parody of what we might actually have wanted. 

For all I know, a universe full of computronium having one vast orgasm really is the highest good. But that seems to be an outcome that we don't want. Who can say why?

Eliezer himself explained how hopelessly complex and incoherent human values are.

Probably we'd need superhuman help to work out some sort of Coherent Extrapolated Volition, (even assuming that makes any sense at all). But creating superhuman help seems to kill us all. 

MIRI spent the last ten years or so pursuing the sorts of mathematically rigorous approaches that might, eventually, after a few decades of top-class mathematical effort, solve the easy bit of the problem: 'given a utility function, make it so'. And as far as I know they discovered that it was all quite a lot harder than it looked.  And mathematically rigorous attacks seem to be not the sort of thing that current AI methods are amenable to anyway.

No one's attacking 'what should that utility function look like?'.

My main worry for the future is that people trying to build aligned AIs will succeed just well enough to create something that's worse than just destroying everything. But I do think that even that is quite beyond us.

Whereas building a superintelligence out of random bits of crap that will just set off and do random things really well seems to be well within our current powers, and a very lot of people are hell-bent on doing just that and it will be here soon. 

So the situation seems to me a bit like 'some homeless lunatic in Hiroshima trying to build a bomb-proof umbrella vs. the Manhattan project'.

Seriously that's all I've got. On the side of doom, a buggerload of brilliant, motivated people working on a very tractable looking problem. On the side of continued human existence, some guys, no plan, no progress, and the problem looks impossible.

I name the political movement that I cannot see any reason to start: "Ineffective Doomerism". If there's a positive singularity, (and quantum suicide makes me think I might see one!) yall have my permission to laugh at me for the rest of time.

Well, I think we're all dead soon, so no point in cryonics, retirement planning, etc. Live for today, sod around in the sunshine while you still can.

Not caring about long-term political issues is quite relaxing! 

One particularly difficult case is when the thing you're trying to verify has a subtle flaw. 

Consider Kempe's proof of the four colour theorem, which was generally accepted for eleven years before being refuted. (It is in fact a proof of the five-colour theorem)

And of course, subtle flaws are much more likely in things that someone has designed to deceive you. 

Against an intelligent adversary, verification might be much harder than generation. I'd cite Marx and Freud as world-sweeping obviously correct theories that eventually turned out to be completely worthless. I can remember a time when both were taken very seriously in academic circles.

It's vastly easier to understand a maths proof (almost any maths proof) than it is to invent one.

It's a lot easier to verify a solution to a problem in NP than it is to generate one (by definition!, but a lot of problems turn out to be NP-complete)

It's a lot easier to check that someone caught a cricket ball than it is to catch one.

It's a lot easier to check that someone can drive than to teach them.

It's a lot easier to tell whether a program can tell the difference between cats and dogs than to write a program that can.


It can be a easier to write a correct computer program than to verify it, and easier to fix the bugs than to find them.

It can be easier to find an algorithm than to prove that it works.

Load More