All power to crypto channel Bankless for their interview with Eliezer:

They actually take his arguments seriously. If I wanted to blow my life savings on some wretched crypto scam I'd certainly listen to these guys about what the best scam to fall for was.


Eliezer is visibly broken. Visibly ill.

This is what it looks like when the great hero of humanity, who has always been remarkably genre-savvy, realises that the movie he's in is 'Lovecraft-style Existential Cosmic Horror', rather than 'Rationalist Harry Potter Fanfic'.

No happy ending. Just madness and despair.

All power to Eliezer for having had a go. What sort of fool gives up before he's actually lost?

There are configurations of matter where something that remembers being me knows that the world only survived because Eliezer tried.

I do hope that my future self has the good grace to build at least one one hundred-foot gold statue of Eliezer.

But that's not the way to bet.


There's almost nothing new here.

The only bit that surprised me was when he says that if you give ChatGPT the right prompt, it can do long multiplication. I went and looked that up. It seems to be true.

Oh great, the fucking thing can execute arbitrary code, can it?

Don't worry, it's just a next-token predictor.


But Eliezer, despite the fact that he is close to tears while explaining, for the eight hundredth time, why we are all going to die, is still ridiculously optimistic about how difficult the alignment problem is.

This is a direct quote:

>  If we got unlimited free retries and 50 years to solve everything, it'd be okay. 
>  We could figure out how to align AI in 50 years given unlimited retries.


No, we couldn't. This is just a sentient version of the Outcome Pump, ably described by Eliezer himself sixteen years ago:

What happens in this Groundhog Day scenario?

It depends very much on what the reset condition is.

If the reset is unconditional, then we just replay the next fifty years forever.

But the replays are not exact, non-determinacy and chaos mean that things go differently every time.

Almost always we build the superintelligence, and then we all die.

Every so often we fail to even do that for some reason. (Nuclear War is probably the most likely reason)

Very very rarely we somehow manage to create something slightly aligned or slightly misaligned, and the universe either becomes insanely great in a really disappointing way, or an unspeakable horror that really doesn't bear thinking about.

But whatever happens, it lasts fifty years and then it ends and resets.

Except, of course, if the AI notices the reset mechanism and disables it somehow.


OK, I can't imagine that that's what Eliezer meant either, but what could he have meant?

Let's suppose that the reset mechanism is somehow outside our universe. It resets this universe, but this universe can't alter it.

(It can't be truly epiphenomenal, because it has to see this universe in order to work out whether to reset it. So there's a two-way causal connection. Let's just handwave that bit away.)

And further suppose that the reset mechanism can condition on absolutely anything you want.

Then this is just the alignment problem all over again. What's your wish?


Suppose the reset happens if more than a billion people die against their will on any given day.

Then surviving universes probably mostly have a deadly plague which took a few years to kill everyone


Suppose the reset happens if Eliezer is not happy about something that happens.

Then the surviving universes look like ones where Eliezer got killed too fast to register his unhappiness.


Etc, etc. I'm not going to labour the point, because Eliezer himself made it so clearly in his original essay.

If you can say what the reset condition should be, you've already solved the hard part of the alignment problem. All that's left is the comparatively easy part of the task where you have to formalize the reset condition, work out how to put that formal goal into an AI, and then build an AI with goals that stay stable under recursive self-improvement.

Which, I agree, probably is the sort of thing that a very bright team of properly cautious people in a sane world *might* have a plastic cat in hell's chance of working out in fifty years.

New Comment
34 comments, sorted by Click to highlight new comments since: Today at 1:10 PM

I think what he meant was, if we could have the same researchers for 50 years repeatedly attempt to build  aligned superintelligent AI, see that it went wrong, figure out why, and then try again -- then we could align AI. No Groundhog Day. And this is normally how science and tech works - you build version 0.1 of a thing, it fails in hilariously spectacular ways, you analyze what went wrong, and you try again.


But his whole point is that with superintelligent AI you cannot do that, because when it fails the first time it kills us. So I don't think you and he are disagreeing.

We're not disagreeing much! 

But suppose the reset leaves everyone with their memories intact, so we've got a chance to 'learn from our mistakes'.

Then cool, we converge much more quickly to whatever condition satisfies the reset. 

Under those conditions, even a fool like me could probably get something to work. Assuming that the answer will actually fit in my mind.

But what? Chances are I'm happy because I don't realise that everyone else is dead and I'm living in a simulation which exists solely in order to avoid the reset. 

What is the reset condition that avoids this sort of thing?

Asking about what reset conditions would avoid this is a bucket error -- the rhetorical point is that no such reset is possible; he's drawing a contrast between normal science and AGI science. I don't understand why this post and your reply comment are attempting to get into the details of the counterfactual that you and EY both agree is a counterfactual. The whole point is that it can't happen!

My point is that even if we solve the technical problem of 'how do we get goals into an AI', the 'what values to put in the AI' problem is also very hard. 

So hard that even in the 'Groundhog Day' universe, it's hard.

And yet people just handwave it.

Almost all the survival probability is in 'we don't build the AI', or 'we're just wrong about something fundamental'.

‘what values to put in the AI’ problem is also very hard.

Yes, it is hard. But Eliezer isn't just handwaving it. Here for example is a 37-page document he wrote on the subject 19 years ago:

Sure, I read that a few years after he wrote it, and it's still probably the best idea, but even if it's feasible it needs superintelligent help! So we have to solve the alignment problem to do it.

Every time humanity creates an AI capable of massive harm, friendly aliens show up, box it, and replace it with a simulation of what would have happened if it was let loose. Or something like that.

Yes, that's the sort of thing that would work, but notice that you've moved the problem of friendliness into the aliens. If we've already got super-powered friendly aliens that can defeat our unaligned superintelligences, we've already won and we're just waiting for them to show up.

But that's exactly how I interpret Elizer's "50 years" comment - if we had those alien friends (or some other reliable guardrails), how long would it take humanity to solve alognment and to the extent we could stop relying on them. Elizer suggested - 50 years or so in presence of hypothetical guardrails, we horribly die on 1st attempt without them. No need to to go into a deep philosophical discussion on the nature of hypothetical guardrails, when the whole point is that we do not have any.

So, if we already had friendly AI, we'd take 50 years to solve friendly AI?

I am totally nitpicking here. I think everyone sane agrees that we're doomed and soon. I'm just trying to destroy the last tiny shreds of hope.

Even if we can sort out the technical problem of giving AIs goals and keeping them stable under self-improvement, we are still doomed.

We have two separate impossible problems to solve, and no clue how to solve either of them.

Maybe if we can do 'strawberry alignment', we can kick the can down the road far enough for someone to have a bright idea. 

Maybe strawberry alignment is enough to get CEV. 

But strawberry alignment is *hard*, both the 'technical problem' and the 'what wish' problem.

I don't think Groundhog Day is enough. We just end up weirdly doomed rather than straightforwardly doomed.

And actually the more I think about it, the more I prefer straightforward doom. Which is lucky, because that's what we're going to get.

I think this is the "shred of hope" is the root of the disagreement - you are interpreting Elizer's 50-year comment as "in some weird hypothetical world, ... " and you are trying to point out that the weird world is so weird that the tiny likelihood we are in that world does not matter, but Elizer's comment was about a counterfactual world that we know we are not in - so the specific structure of that counterfactual world does not matter (in fact, it is counterfactual exactly because it is not logically consistent). Basically, Elizer's argument is roughly "in a world where unaligned AI is not a thing that kills us all [not because of some weird structure of a hypothetical world, but just as a logical counterfactual on the known fact of "unaligned AGI" results in humanity dying], ..." where the whole point is that we know that's not the world we are in. Does that help? I tried to make the counterfactual world a little more intuitive to think about by introducing friendly aliens and such, but that's not what was originally meant there, I think.


I'm just trying to destroy the last tiny shreds of hope.

In what version of reality do you think anyone has hope for an ai alignment Groundhog Day?


I think everyone sane agrees that we're doomed and soon.

Even as a doomer among doomers, you, with respect, come off as a rambling madman.

The problem is that the claim you’re making, such that alignment is so doomed that Eliezer Yudkowsky, one of the most if not the most of pessimistic voices among alignment people, is still somehow over optimistic about humanity’s prospects, is unsubstantiated.

It’s a claim, I think, that deserves some substantiation. Maybe you believe you’ve already provided as much. I disagree.

I’m guessing you’re operating on strong intuition here; and you know what, great, share your model of the world! But you apparently made this post with the intention to persuade, and I’m telling you you’ve done a poor job.

EDIT: To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.

To be clear, even if I were somehow granted vivid knowledge of the future through precognition, you’d still seem crazy to me at this point.


(I assume you mean vivid knowledge of the future in which we are destroyed, obviously in the case where everything goes well I've got some problem with my reasoning)

That's a good distinction to make, a man can be right for the wrong reasons. 

Even as a doomer among doomers, you, with respect, come off as a rambling madman.

Certainly mad enough to take "madman" as a compliment, thank you!

I'd be interested if you know a general method I could use to tell if I'm mad. The only time I actually know it happened (thyroid overdose caused a manic episode) I noticed pretty quickly and sought help. What test should I try today?

Obviously "everyone disagrees with me and I can't convince most people" is a bad sign. But after long and patient effort I have convinced a number of unfortunates in my circle of friends. Some of whom have always seemed pretty sharp to me. 

And you must admit, the field as a whole seems to be coming round to my point of view!

Rambling I do not take as a compliment. But nevertheless I thank you for the feedback. 

I thought I'd written the original post pretty clearly and succinctly. If not, advice on how to write more clearly is always welcome. If you get my argument, can you steelman it?

I’m guessing you’re operating on strong intuition here

Your guess is correct, I literally haven't shifted my position on all this since 2010. Except to notice that everything's happening much faster than I expected it to. Thirteen years ago I expected this to kill our children. Now I worry that it's going to kill my parents. AlphaZero was the fire alarm for me. General Game Playing was one of the more important sub-problems.  

I agree that if you haven't changed your mind for thirteen years in a field that's moving fast, you're probably stuck.

I think my basic intuitions are: 

"It's a terrible idea to create a really strong mind that doesn't like you."

"Really strong minds are physically possible, humans are nowhere near."

"Human-level AI is easy because evolution did it to us, quickly, and evolution is stupid."

"Recursive self-improvement is possible."

Which of these four things do you disagree with? Or do you think the four together are insufficient?


If you get my argument, can you steelman it?

I get that your argument is essentially as follows:

1.) Solving the problem of what values to put into an ai, even given the other technical issues being solved, is impossibly difficult in real life.

2.) To prove the problem’s impossible difficulty, here’s a much kinder version of reality where the problem still remains impossible.

I don’t think you did 2, and it requires me to already accept 1 is true, which I think it probably isn’t, and I think that most would agree with me on this point, at least in principle.

Which of these four things do you disagree with?

I don’t disagree with any of them. I doubt there’s a convincing argument that could get me to disagree with any of those as presented.

What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.

I’m not clever enough to figure out what the solution is mind you, nor am I especially confident that someone else is necessarily going to. Please don’t confuse me for someone who doesn’t often worry about these things.

What I am not convinced of, is that given all those assumptions being true, certain doom necessarily follows, or that there is no possible humanly tractable scheme which avoids doom in whatever time we have left.


OK, cool, I mean "just not building the AI" is a good way to avoid doom, and that still seems at least possible, so we're maybe on the same page there.

And I think you got what I was trying to say, solving 1 and/or 2 can't be done iteratively or by patching together a huge list of desiderata. We have to solve philosophy somehow, without superintelligent help. As I say, that looks like the harder part to me.

Please don’t confuse me for someone who doesn’t often worry about these things.

I promise I'll try not to! 

None. But if a problem's not solvable in an easy case, it's not solvable in a harder case. 

Same argument as for thinking about Solomonoff Induction or Halting Oracles. If you can't even do it with magic powers, that tells you something about what you can really do.

Restarting an earlier thread in a clean slate.

Let's define a scientific difficulty of D(P) of a scientific problem as "an approximate number of years of trial-and-error effort that humanity would need to solve P, if P was considered an important problem to solve". He estimates D(alignment) at about 50 years - but his whole point is that for alignment, this particular metric is meaningless because the trial-and-error is not an option. This is just meant to be as a counterargument to somebody saying that alignment does not seem to be much harder than X, and we solved X - but his counterargument is yes, D(X) was shown to be about 50 years in the past, and by just scientific difficulty level D(alignment) might also have the same order of magnitude, but unlike X, alignment cannon be solved via trial-and-error, so comparison with X is not actually informative.

This is the opposite of considering a trial-and-error solution scenario for alignment as an actual possibility.

Does this make sense?

That makes perfect sense, thank you. And maybe, if we've already got the necessary utility function, stability under self-improvement might be solvable as if it were just a really difficult maths problem. It doesn't look that difficult to me, a priori, to change your cognitive abilities whilst keeping your goals.

AlphaZero got its giant inscrutable matrices by working from a straightforward start of 'checkmate is good'. I can imagine something like AlphaZero designing a better AlphaZero (AlphaOne?) and handing over the clean definition of 'checkmate is good' and trusting its successor to work out the details better than it could itself.

I get cleverer if I use pencil and paper, it doesn't seem to redefine what's good when I do. And no-one stopped liking diamonds when we worked out that carbon atoms weren't fundamental objects.


My point is that the necessary utility function is the hard bit. It doesn't look anything like a maths problem to me, *and* we can't sneak up on it iteratively with a great mass of patches until it's good enough. 

We've been paying a reasonable amount of attention to 'what is good?' for at least two thousand years, and in all that time no-one came up with anything remotely sensible sounding.

I would doubt that the question meant anything, if it were not that I can often say which of two possible scenarios I prefer. And I notice that other people often have the same preference.

I do think that Eliezer thinks that given the Groundhog Day version of the problem, restart every time you do something that doesn't work out, we'd be able to pull it off.

I doubt that even that's true. 'Doesn't work out' is too nebulous.

But at this point I guess we're talking only about Eliezer's internal thoughts, and I have no insight there. I was attacking a direct quote from the podcast, but maybe I'm misinterpreting something that wasn't meant to bear much weight.

Putting RamblinDash's point another way: when Eliezer says "unlimited retries", he's not talking about a Groundhog Day style reset. He's just talking about the mundane thing where, when you're trying to fix a car engine or something, you try one fix, and if it doesn't start, you try another fix, and if it still doesn't start, you try another fix, and so on. So the scenario Eliezer is imagining is this: we have 50 years. Year 1, we build an AI, and it kills 1 million people. We shut it off. Year 2, we fix the AI. We turn it back on, it kills another million people. We shut it off, fix it, turn it back on. Etc, until it stops killing people when we turn it on. Eliezer is saying, if we had 50 years to do that, we could align an AI. The problem is, in reality, the first time we turn it on, it doesn't kill 1 million people, it kills everyone. We only get one try.

I like your phrasing better, but I think it just hides some magic.

In this situation I think we get an AI that repeatedly kills 999,999 people. It's just the nearest unblocked path problem.

The exact reset/restart/turn it off and try again condition matters, and nothing works unless the reset condition is 'that isn't going to do something we approve of'.

The only sense I can make of the idea is 'If we already had a friendly AI to protect us while we played, we could work out how to build a friendly AI'. 

I don't think we could iterate to a good outcome, even if we had magic powers of iteration.

Your version makes it strictly harder than the 'Groundhog Day with Memories Intact' version. And I don't think we could solve that version.

I don't think that this part is the hardest. I think with enough limiting conditions (such as "people are still in charge", "people are still people", "world is close enough to our current world and ourr reasonably optimistic expectations of it's future", "those rules should be met continuously at each moment between now and then") etc. we can find something that can work.
Other parts (how to teach those rules to AI and how to prevent everyone from launching AGI that is not taught them) look harder to me.

For sure they are both hard! But I'd like to hear your wish, even in the most informal terms.

What do I wish from AI? I gave a rough list in this thread, and also here
Overall, I think both AI and those giving goals for it should be as conservative and restrained as possible. They should value, above all, the preservance of the status quo of people, world and AI. With a VERY steady improvements of each. Move ahead, but move slowly, don't break things.

So, to quote that link:


My feeling is that what we people (edit: or most of us) really want is the normal human life, but reasonably better.

Reasonably long life. Reasonably less suffering. Reasonably more happiness. People that we care about. People that care about us. People that need us. People that we need. People we fight with. Goals to achieve. Causes to follow. Hardships to overcome. 


I notice that the word "reasonably" is doing most of the work there. (much like in English Common Law, where it works reasonably well, because it's interpreted by reasonable human beings.

That's the whole problem! 

As a very wise man once said:

There are three kinds of genies:  Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

On the other hand, you say above (my italics):

I think both AI and those giving goals for it should be as conservative and restrained as possible.

That's much more like the sort of thing you can give to an optimizer. And it results in the world frozen solid.

>That's much more like the sort of thing you can give to an optimizer. And it results in the world frozen solid.

That's why I made sure to specify the gradual improvement. Also, development and improvement are also the natural state of humanity and people, so taking that away from them means breaking the status quo too.

>I notice that the word "reasonably" is doing most of the work there. (much like in English Common Law, where it works reasonably well, because it's interpreted by reasonably human beings.

Mathematically speaking, polynomials are reasonable functions. Step functions or factorials are not. Exponents are reasonable, if they are exponent over ~constant value since somewhere before year 2000. Metrics of the reasonable world should be described with reasonable functions.

>There are three kinds of genies:  Genies to whom you can safely say "I wish for you to do what I should wish for"; genies for which no wish is safe; and genies that aren't very powerful or intelligent.

I'll take third please. It just should be powerful enough that it can prevent other two types from being created in foreseable future.

Also, it seems that you imagine AI as not just the second type of genie, but of a genie that is explicitly hostile and would misinterpret your wish on purpose. Of cause, making any wish for such genie would end badly.

Genies that are not very powerful or intelligent are not powerful enough to prevent the other two types from being created. They need to be more capable than you are, or you could just do the stuff yourself.

not just the second type of genie, but of a genie that is explicitly hostile and would misinterpret your wish on purpose

The second type of genie is hostile and would misinterpret your wish! Not deliberately, not malevolently. Just because that's what optimizers are like unless they're optimising for the right thing.

Creating something malevolent, that would deliberately misinterpret an otherwise sound wish also requires solving the alignment problem. You'd need to somehow get human values into it so that it can deliberately pervert them.

Honestly, Eliezer's original essay from aeons ago explains all this. You should read it.

AI can be useful without being ASI. Including in things such as identifying and preventing situations that could lead to creation of unaligned ASI.

Of cause, conservative and human-friendly AI would probably lose to existing AI with comparable power, but not limited by those "handicaps". That's why it's important to prevent the possibility of their creation, instead of fighting them "fairly".

And yes, computronium maximising is a likely behaviour, but there are ideas how to avoid it, such as or

Of cause, all those ideas and possibilities may be actually duds. And we are doomed no matter not. But then what's the point of seeking for solution that does not exist?

I'm not proposing solutions here. I think we face an insurmountable opportunity.

But for some reason I don't understand, I am driven to stare the problem in the face in its full difficulty. 

I think it may be caused by
I suffer from that too. 
That's a very counterproductive state of mind if the task is unsolvable in it's full difficulty. It makes you lose hope and stop trying solutions that would work if situation is not as bad as you imagined.

A good guess, and thank you for the reference, but (although I admit that the prospect of global imminent doom is somewhat anxious-making), anxiety isn't a state of mind I'm terribly familiar with personally.  I'm very emotionally stable usually, and I lost all hope years ago. It doesn't bother me much.

It's more that I have the 'taking ideas seriously' thing in full measure, once I get an *idee fixe* I can't let it go until I've solved it. AI Doom is currently third on the list after chess and the seed oil nonsense, but the whole Bing/Sydney thing started me thinking about it again and someone emailed me Eliezer's podcast, you know how it goes.

Although I do have a couple of friends who suffer greatly from Anxiety Disorder, and you have my sympathies, especially if you're interested in all this stuff! Honestly run away, there's nothing to be done and you have a life to live. 

Totally off topic but have you tried lavender pills?  I started recommending them to friends after Scott Alexander said they might work, and out of three people I've got one total failure, one refusal to take for good reasons, and one complete fix! Obviously do your own research as to side effects, just cause it's 'natural' doesn't mean it's safe. The main one is if you're a girl it will interfere with your hormones and might cause miscarriages.

Thanks for advice. Looks like my mind works similar to yours, i.e. can't give up task it has latched on. But mine brain draws way more from the rest of my body than it is healthy.

It's not as bad now as it was in the first couple of week, but I still have problem sleeping regularly, because my mind can't switch off the overdrive mode. So, I become sleepy AND agitated at the same time, which is quite unpleasant and unproductive state.

There are no Lavender Pills around here, but I take other anxiety medications, and they help, to an extent.

These seemed good, they taste of lavender, but the person trying them got no effect:

Lindens Lavender Essential Oil 80mg Capsules

The person who had it work for them tried something purchased from a shop, Herbal Calms maybe?, anyway, lavender oil in vegetable oil in little capsules. She reports that she can get to sleep now, and can face doing things that she couldn't previously do due to anxiety if she pops a capsule first.