MIRI announces new "Death With Dignity" strategy

    tl;dr:  It's obvious at this point that humanity isn't going to solve the alignment problem, or even try very hard, or even go out with much of a fight.  Since survival is unattainable, we should shift the focus of our efforts to helping humanity die with with slightly more dignity.

    Well, let's be frank here.  MIRI didn't solve AGI alignment and at least knows that it didn't.  Paul Christiano's incredibly complicated schemes have no chance of working in real life before DeepMind destroys the world.  Chris Olah's transparency work, at current rates of progress, will at best let somebody at DeepMind give a highly speculative warning about how the current set of enormous inscrutable tensors, inside a system that was recompiled three weeks ago and has now been training by gradient descent for 20 days, might possibly be planning to start trying to deceive its operators.

    Management will then ask what they're supposed to do about that.

    Whoever detected the warning sign will say that there isn't anything known they can do about that.  Just because you can see the system might be planning to kill you, doesn't mean that there's any known way to build a system that won't do that.  Management will then decide not to shut down the project - because it's not certain that the intention was really there or that the AGI will really follow through, because other AGI projects are hard on their heels, because if all those gloomy prophecies are true then there's nothing anybody can do about it anyways.  Pretty soon that troublesome error signal will vanish.

    When Earth's prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.

    That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained.

    Consider the world if Chris Olah had never existed.  It's then much more likely that nobody will even try and fail to adapt Olah's methodologies to try and read complicated facts about internal intentions and future plans, out of whatever enormous inscrutable tensors are being integrated a million times per second, inside of whatever recently designed system finished training 48 hours ago, in a vast GPU farm that's already helpfully connected to the Internet.

    It is more dignified for humanity - a better look on our tombstone - if we die after the management of the AGI project was heroically warned of the dangers but came up with totally reasonable reasons to go ahead anyways.

    Or, failing that, if people made a heroic effort to do something that could maybe possibly have worked to generate a warning like that but couldn't actually in real life because the latest tensors were in a slightly different format and there was no time to readapt the methodology.  Compared to the much less dignified-looking situation if there's no warning and nobody even tried to figure out how to generate one.

    Or take MIRI.  Are we sad that it looks like this Earth is going to fail?  Yes.  Are we sad that we tried to do anything about that?  No, because it would be so much sadder, when it all ended, to face our ends wondering if maybe solving alignment would have just been as easy as buckling down and making a serious effort on it - not knowing if that would've just worked, if we'd only tried, because nobody had ever even tried at all.  It wasn't subjectively overdetermined that the (real) problems would be too hard for us, before we made the only attempt at solving them that would ever be made.  Somebody needed to try at all, in case that was all it took.

    It's sad that our Earth couldn't be one of the more dignified planets that makes a real effort, correctly pinpointing the actual real difficult problems and then allocating thousands of the sort of brilliant kids that our Earth steers into wasting their lives on theoretical physics.  But better MIRI's effort than nothing.  What were we supposed to do instead, pick easy irrelevant fake problems that we could make an illusion of progress on, and have nobody out of the human species even try to solve the hard scary real problems, until everybody just fell over dead?

    This way, at least, some people are walking around knowing why it is that if you train with an outer loss function that enforces the appearance of friendliness, you will not get an AI internally motivated to be friendly in a way that persists after its capabilities start to generalize far out of the training distribution...

    To be clear, nobody's going to listen to those people, in the end.  There will be more comforting voices that sound less politically incongruent with whatever agenda needs to be pushed forward that week.  Or even if that ends up not so, this isn't primarily a social-political problem, of just getting people to listen.  Even if DeepMind listened, and Anthropic knew, and they both backed off from destroying the world, that would just mean Facebook AI Research destroyed the world a year(?) later.

    But compared to being part of a species that walks forward completely oblivious into the whirling propeller blades, with nobody having seen it at all or made any effort to stop it, it is dying with a little more dignity, if anyone knew at all.  You can feel a little incrementally prouder to have died as part of a species like that, if maybe not proud in absolute terms.

    If there is a stronger warning, because we did more transparency research?  If there's deeper understanding of the real dangers and those come closer to beating out comfortable nonrealities, such that DeepMind and Anthropic really actually back off from destroying the world and let Facebook AI Research do it instead?  If they try some hopeless alignment scheme whose subjective success probability looks, to the last sane people, more like 0.1% than 0?  Then we have died with even more dignity!  It may not get our survival probabilities much above 0%, but it would be so much more dignified than the present course looks to be!

    Now of course the real subtext here, is that if you can otherwise set up the world so that it looks like you'll die with enough dignity - die of the social and technical problems that are really unavoidable, after making a huge effort at coordination and technical solutions and failing, rather than storming directly into the whirling helicopter blades as is the present unwritten plan -

    - heck, if there was even a plan at all -

    - then maybe possibly, if we're wrong about something fundamental, somehow, somewhere -

    - in a way that makes things easier rather than harder, because obviously we're going to be wrong about all sorts of things, it's a whole new world inside of AGI -

    - although, when you're fundamentally wrong about rocketry, this does not usually mean your rocket prototype goes exactly where you wanted on the first try while consuming half as much fuel as expected; it means the rocket explodes earlier yet, and not in a way you saw coming, being as wrong as you were -

    - but if we get some miracle of unexpected hope, in those unpredicted inevitable places where our model is wrong -

    - then our ability to take advantage of that one last hope, will greatly depend on how much dignity we were set to die with, before then.

    If we can get on course to die with enough dignity, maybe we won't die at all...?

    In principle, yes.  Let's be very clear, though:  Realistically speaking, that is not how real life works.

    It's possible for a model error to make your life easier.  But you do not get more surprises that make your life easy, than surprises that make your life even more difficult.  And people do not suddenly become more reasonable, and make vastly more careful and precise decisions, as soon as they're scared.  No, not even if it seems to you like their current awful decisions are weird and not-in-the-should-universe, and surely some sharp shock will cause them to snap out of that weird state into a normal state and start outputting the decisions you think they should make.

    So don't get your heart set on that "not die at all" business.  Don't invest all your emotion in a reward you probably won't get.  Focus on dying with dignity - that is something you can actually obtain, even in this situation.  After all, if you help humanity die with even one more dignity point, you yourself die with one hundred dignity points!  Even if your species dies an incredibly undignified death, for you to have helped humanity go down with even slightly more of a real fight, is to die an extremely dignified death.

    "Wait, dignity points?" you ask.  "What are those?  In what units are they measured, exactly?"

    And to this I reply:  Obviously, the measuring units of dignity are over humanity's log odds of survival - the graph on which the logistic success curve is a straight line.  A project that doubles humanity's chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity.

    But if enough people can contribute enough bits of dignity like that, wouldn't that mean we didn't die at all?  Yes, but again, don't get your hopes up.  Don't focus your emotions on a goal you're probably not going to obtain.  Realistically, we find a handful of projects that contribute a few more bits of counterfactual dignity; get a bunch more not-specifically-expected bad news that makes the first-order object-level situation look even worse (where to second order, of course, the good Bayesians already knew that was how it would go); and then we all die.

    With a technical definition in hand of what exactly constitutes dignity, we may now consider some specific questions about what does and doesn't constitute dying with dignity.

    Q1:  Does 'dying with dignity' in this context mean accepting the certainty of your death, and not childishly regretting that or trying to fight a hopeless battle?

    Don't be ridiculous.  How would that increase the log odds of Earth's survival?

    My utility function isn't up for grabs, either.  If I regret my planet's death then I regret it, and it's beneath my dignity to pretend otherwise.

    That said, I fought hardest while it looked like we were in the more sloped region of the logistic success curve, when our survival probability seemed more around the 50% range; I borrowed against my future to do that, and burned myself out to some degree.  That was a deliberate choice, which I don't regret now; it was worth trying, I would not have wanted to die having not tried, I would not have wanted Earth to die without anyone having tried.  But yeah, I am taking some time partways off, and trying a little less hard, now.  I've earned a lot of dignity already; and if the world is ending anyways and I can't stop it, I can afford to be a little kind to myself about that.

    When I tried hard and burned myself out some, it was with the understanding, within myself, that I would not keep trying to do that forever.  We cannot fight at maximum all the time, and some times are more important than others.  (Namely, when the logistic success curve seems relatively more sloped; those times are relatively more important.)

    All that said:  If you fight marginally longer, you die with marginally more dignity.  Just don't undignifiedly delude yourself about the probable outcome.

    Q2:  I have a clever scheme for saving the world!  I should act as if I believe it will work and save everyone, right, even if there's arguments that it's almost certainly misguided and doomed?  Because if those arguments are correct and my scheme can't work, we're all dead anyways, right?

    A:  No!  That's not dying with dignity!  That's stepping sideways out of a mentally uncomfortable world and finding an escape route from unpleasant thoughts!  If you condition your probability models on a false fact, something that isn't true on the mainline, it means you've mentally stepped out of reality and are now living somewhere else instead.

    There are more elaborate arguments against the rationality of this strategy, but consider this quick heuristic for arriving at the correct answer:  That's not a dignified way to die.  Death with dignity means going on mentally living in the world you think is reality, even if it's a sad reality, until the end; not abandoning your arts of seeking truth; dying with your commitment to reason intact.

    You should try to make things better in the real world, where your efforts aren't enough and you're going to die anyways; not inside a fake world you can save more easily.

    Q2:  But what's wrong with the argument from expected utility, saying that all of humanity's expected utility lies within possible worlds where my scheme turns out to be feasible after all?

    A:  Most fundamentally?  That's not what the surviving worlds look like.  The surviving worlds look like people who lived inside their awful reality and tried to shape up their impossible chances; until somehow, somewhere, a miracle appeared - the model broke in a positive direction, for once, as does not usually occur when you are trying to do something very difficult and hard to understand, but might still be so - and they were positioned with the resources and the sanity to take advantage of that positive miracle, because they went on living inside uncomfortable reality.  Positive model violations do ever happen, but it's much less likely that somebody's specific desired miracle that "we're all dead anyways if not..." will happen; these people have just walked out of the reality where any actual positive miracles might occur.

    Also and in practice?  People don't just pick one comfortable improbability to condition on.  They go on encountering unpleasant facts true on the mainline, and each time saying, "Well, if that's true, I'm doomed, so I may as well assume it's not true," and they say more and more things like this.  If you do this it very rapidly drives down the probability mass of the 'possible' world you're mentally inhabiting.  Pretty soon you're living in a place that's nowhere near reality.  If there were an expected utility argument for risking everything on an improbable assumption, you'd get to make exactly one of them, ever.  People using this kind of thinking usually aren't even keeping track of when they say it, let alone counting the occasions.

    Also also, in practice?  In domains like this one, things that seem to first-order like they "might" work... have essentially no chance of working in real life, to second-order after taking into account downward adjustments against optimism.  AGI is a scientifically unprecedented experiment and a domain with lots of optimization pressures some of which work against you and unforeseeable intelligently selected execution pathways and with a small target to hit and all sorts of extreme forces that break things and that you couldn't fully test before facing them.  AGI alignment seems like it's blatantly going to be an enormously Murphy-cursed domain, like rocket prototyping or computer security but worse.

    In a domain like, if you have a clever scheme for winning anyways that, to first-order theoretical theory, totally definitely seems like it should work, even to Eliezer Yudkowsky rather than somebody who just goes around saying that casually, then maybe there's like a 50% chance of it working in practical real life after all the unexpected disasters and things turning out to be harder than expected.

    If to first-order it seems to you like something in a complicated unknown untested domain has a 40% chance of working, it has a 0% chance of working in real life.

    Also also also in practice?  Harebrained schemes of this kind are usually actively harmful.  Because they're invented by the sort of people who'll come up with an unworkable scheme, and then try to get rid of counterarguments with some sort of dismissal like "Well if not then we're all doomed anyways."

    If nothing else, this kind of harebrained desperation drains off resources from those reality-abiding efforts that might try to do something on the subjectively apparent doomed mainline, and so position themselves better to take advantage of unexpected hope, which is what the surviving possible worlds mostly look like.

    The surviving worlds don't look like somebody came up with a harebrained scheme, dismissed all the obvious reasons it wouldn't work with "But we have to bet on it working," and then it worked.

    That's the elaborate argument about what's rational in terms of expected utility, once reasonable second-order commonsense adjustments are taken into account.  Note, however, that if you have grasped the intended emotional connotations of "die with dignity", it's a heuristic that yields the same answer much faster.  It's not dignified to pretend we're less doomed than we are, or step out of reality to live somewhere else.

    Q3:  Should I scream and run around and go through the streets wailing of doom?

    A:  No, that's not very dignified.  Have a private breakdown in your bedroom, or a breakdown with a trusted friend, if you must.

    Q3:  Why is that bad from a coldly calculating expected utility perspective, though?

    A:  Because it associates belief in reality with people who act like idiots and can't control their emotions, which worsens our strategic position in possible worlds where we get an unexpected hope.

    Q4:  Should I lie and pretend everything is fine, then?  Keep everyone's spirits up, so they go out with a smile, unknowing?

    A:  That also does not seem to me to be dignified.  If we're all going to die anyways, I may as well speak plainly before then.  If into the dark we must go, let's go there speaking the truth, to others and to ourselves, until the end.

    Q4:  Okay, but from a coldly calculating expected utility perspective, why isn't it good to lie to keep everyone calm?  That way, if there's an unexpected hope, everybody else will be calm and oblivious and not interfering with us out of panic, and my faction will have lots of resources that they got from lying to their supporters about how much hope there was!  Didn't you just say that people screaming and running around while the world was ending would be unhelpful?

    A:  You should never try to reason using expected utilities again.  It is an art not meant for you.  Stick to intuitive feelings henceforth.

    There are, I think, people whose minds readily look for and find even the slightly-less-than-totally-obvious considerations of expected utility, what some might call "second-order" considerations.  Ask them to rob a bank and give the money to the poor, and they'll think spontaneously and unprompted about insurance costs of banking and the chance of getting caught and reputational repercussions and low-trust societies and what if everybody else did that when they thought it was a good cause; and all of these considerations will be obviously-to-them consequences under consequentialism.

    These people are well-suited to being 'consequentialists' or 'utilitarians', because their mind naturally sees all the consequences and utilities, including those considerations that others might be tempted to call by names like "second-order" or "categorical" and so on.

    If you ask them why consequentialism doesn't say to rob banks, they reply, "Because that actually realistically in real life would not have good consequences.  Whatever it is you're about to tell me as a supposedly non-consequentialist reason why we all mustn't do that, seems to you like a strong argument, exactly because you recognize implicitly that people robbing banks would not actually lead to happy formerly-poor people and everybody living cheerfully ever after."

    Others, if you suggest to them that they should rob a bank and give the money to the poor, will be able to see the helped poor as a "consequence" and a "utility", but they will not spontaneously and unprompted see all those other considerations in the formal form of "consequences" and "utilities".

    If you just asked them informally whether it was a good or bad idea, they might ask "What if everyone did that?" or "Isn't it good that we can live in a society where people can store and transmit money?" or "How would it make effective altruism look, if people went around doing that in the name of effective altruism?"  But if you ask them about consequences, they don't spontaneously, readily, intuitively classify all these other things as "consequences"; they think that their mind is being steered onto a kind of formal track, a defensible track, a track of stating only things that are very direct or blatant or obvious.  They think that the rule of consequentialism is, "If you show me a good consequence, I have to do that thing."

    If you present them with bad things that happen if people rob banks, they don't see those as also being 'consequences'.  They see them as arguments against consequentialism; since, after all consequentialism says to rob banks, which obviously leads to bad stuff, and so bad things would end up happening if people were consequentialists.  They do not do a double-take and say "What?"  That consequentialism leads people to do bad things with bad outcomes is just a reasonable conclusion, so far as they can tell.

    People like this should not be 'consequentialists' or 'utilitarians' as they understand those terms.  They should back off from this form of reasoning that their mind is not naturally well-suited for processing in a native format, and stick to intuitively informally asking themselves what's good or bad behavior, without any special focus on what they think are 'outcomes'.

    If they try to be consequentialists, they'll end up as Hollywood villains describing some grand scheme that violates a lot of ethics and deontology but sure will end up having grandiose benefits, yup, even while everybody in the audience knows perfectly well that it won't work.  You can only safely be a consequentialist if you're genre-savvy about that class of arguments - if you're not the blind villain on screen, but the person in the audience watching who sees why that won't work.

    Q4:  I know EAs shouldn't rob banks, so this obviously isn't directed at me, right?

    A:  The people of whom I speak will look for and find the reasons not to do it, even if they're in a social environment that doesn't have strong established injunctions against bank-robbing specifically exactly.  They'll figure it out even if you present them with a new problem isomorphic to bank-robbing but with the details changed.

    Which is basically what you just did, in my opinion.

    Q4:  But from the standpoint of cold-blooded calculation -

    A:  Calculations are not cold-blooded.  What blood we have in us, warm or cold, is something we can learn to see more clearly with the light of calculation.

    If you think calculations are cold-blooded, that they only shed light on cold things or make them cold, then you shouldn't do them.  Stay by the warmth in a mental format where warmth goes on making sense to you.

    Q4:  Yes yes fine fine but what's the actual downside from an expected-utility standpoint?

    A:  If good people were liars, that would render the words of good people meaningless as information-theoretic signals, and destroy the ability for good people to coordinate with others or among themselves.

    If the world can be saved, it will be saved by people who didn't lie to themselves, and went on living inside reality until some unexpected hope appeared there.

    If those people went around lying to others and paternalistically deceiving them - well, mostly, I don't think they'll have really been the types to live inside reality themselves.  But even imagining the contrary, good luck suddenly unwinding all those deceptions and getting other people to live inside reality with you, to coordinate on whatever suddenly needs to be done when hope appears, after you drove them outside reality before that point.  Why should they believe anything you say?

    Q4:  But wouldn't it be more clever to -

    A:  Stop.  Just stop.  This is why I advised you to reframe your emotional stance as dying with dignity.

    Maybe there'd be an argument about whether or not to violate your ethics if the world was actually going to be saved at the end.  But why break your deontology if it's not even going to save the world?  Even if you have a price, should you be that cheap?

    Q4  But we could maybe save the world by lying to everyone about how much hope there was, to gain resources, until -

    A:  You're not getting it.  Why violate your deontology if it's not going to really actually save the world in real life, as opposed to a pretend theoretical thought experiment where your actions have only beneficial consequences and none of the obvious second-order detriments?

    It's relatively safe to be around an Eliezer Yudkowsky while the world is ending, because he's not going to do anything extreme and unethical unless it would really actually save the world in real life, and there are no extreme unethical actions that would really actually save the world the way these things play out in real life, and he knows that.  He knows that the next stupid sacrifice-of-ethics proposed won't work to save the world either, actually in real life.  He is a 'pessimist' - that is, a realist, a Bayesian who doesn't update in a predictable direction, a genre-savvy person who knows that the viewer would say if there were a villain on screen making that argument for violating ethics.  He will not, like a Hollywood villain onscreen, be deluded into thinking that some clever-sounding deontology-violation is bound to work out great, when everybody in the audience watching knows perfectly well that it won't.

    My ethics aren't for sale at the price point of failure.  So if it looks like everything is going to fail, I'm a relatively safe person to be around.

    I'm a genre-savvy person about this genre of arguments and a Bayesian who doesn't update in a predictable direction.  So if you ask, "But Eliezer, what happens when the end of the world is approaching, and in desperation you cling to whatever harebrained scheme has Goodharted past your filters and presented you with a false shred of hope; what then will you do?" - I answer, "Die with dignity."  Where "dignity" in this case means knowing perfectly well that's what would happen to some less genre-savvy person; and my choosing to do something else which is not that.  But "dignity" yields the same correct answer and faster.

    Q5:  "Relatively" safe?

    A:  It'd be disingenuous to pretend that it wouldn't be even safer to hang around somebody who had no clue what was coming, didn't know any mental motions for taking a worldview seriously, thought it was somebody else's problem to ever do anything, and would just cheerfully party with you until the end.

    Within the class of people who know the world is ending and consider it to be their job to do something about that, Eliezer Yudkowsky is a relatively safe person to be standing next to.  At least, before you both die anyways, as is the whole problem there.

    Q5:  Some of your self-proclaimed fans don't strike me as relatively safe people to be around, in that scenario?

    A:  I failed to teach them whatever it is I know.  Had I known then what I knew now, I would have warned them not to try.

    If you insist on putting it into terms of fandom, though, feel free to notice that Eliezer Yudkowsky is much closer to being a typical liberaltarian science-fiction fan, as was his own culture that actually birthed him, than he is a typical member of any subculture that might have grown up later.  Liberaltarian science-fiction fans do not usually throw away all their ethics at the first sign of trouble.  They grew up reading books where those people were the villains.

    Please don't take this as a promise from me to play nice, as you define niceness; the world is ending, and also people have varying definitions of what is nice.  But I presently mostly expect to end up playing nice, because there won't be any options worth playing otherwise.

    It is a matter of some concern to me that all this seems to be an alien logic to some strange people who - this fact is still hard for me to grasp on an emotional level - don't spontaneously generate all of this reasoning internally, as soon as confronted with the prompt.  Alas.

    Q5:  Then isn't it unwise to speak plainly of these matters, when fools may be driven to desperation by them?  What if people believe you about the hopeless situation, but refuse to accept that conducting themselves with dignity is the appropriate response?

    A:  I feel like I've now tried to live my life that way for a while, by the dictum of not panicking people; and, like everything else I've tried, that hasn't particularly worked?  There are no plans left to avoid disrupting, now, with other people's hypothetical panic.

    I think we die with slightly more dignity - come closer to surviving, as we die - if we are allowed to talk about these matters plainly.  Even given that people may then do unhelpful things, after being driven mad by overhearing sane conversations.  I think we die with more dignity that way, than if we go down silent and frozen and never talking about our impending death for fear of being overheard by people less sane than ourselves.

    I think that in the last surviving possible worlds with any significant shred of subjective probability, people survived in part because they talked about it; even if that meant other people, the story's antagonists, might possibly hypothetically panic.

    But still, one should present the story-antagonists with an easy line of retreat.  So -

    Q6:  Hey, this was posted on April 1st.  All of this is just an April Fool's joke, right?

    A:  Why, of course!  Or rather, it's a preview of what might be needful to say later, if matters really do get that desperate.  You don't want to drop that on people suddenly and with no warning.

    Q6:  Oh.  Really?  That would be such a relief!

    A:  Only you can decide whether to live in one mental world or the other.

    Q6:  Wait, now I'm confused.  How do I decide which mental world to live in?

    A:  By figuring out what is true, and by allowing no other considerations than that to enter; that's dignity.

    Q6:  But that doesn't directly answer the question of which world I'm supposed to mentally live in!  Can't somebody just tell me that?

    A:  Well, conditional on you wanting somebody to tell you that, I'd remind you that many EAs hold that it is very epistemically unvirtuous to just believe what one person tells you, and not weight their opinion and mix it with the weighted opinions of others?

    Lots of very serious people will tell you that AGI is thirty years away, and that's plenty of time to turn things around, and nobody really knows anything about this subject matter anyways, and there's all kinds of plans for alignment that haven't been solidly refuted so far as they can tell.

    I expect the sort of people who are very moved by that argument, to be happier, more productive, and less disruptive, living mentally in that world.

    Q6:  Thanks for answering my question!  But aren't I supposed to assign some small probability to your worldview being correct?

    A:  Conditional on you being the sort of person who thinks you're obligated to do that and that's the reason you should do it, I'd frankly rather you didn't.  Or rather, seal up that small probability in a safe corner of your mind which only tells you to stay out of the way of those gloomy people, and not get in the way of any hopeless plans they seem to have.

    Q6:  Got it.  Thanks again!

    A:  You're welcome!  Goodbye and have fun!

    New Comment
    547 comments, sorted by Click to highlight new comments since:
    Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

    That's great and all, but with all due respect:

    Fuck. That. Noise.

    Regardless of the odds of success and what the optimal course of action actually is, I would be very hard pressed to say that I'm trying to "help humanity die with dignity". Regardless of what the optimal action should be given that goal, on an emotional level, it's tantamount to giving up.

    Before even getting into the cost/benefit of that attitude, in the worlds where we do make it out alive, I don't want to look back and see a version of me where that became my goal. I also don't think that if that was my goal, that I would fight nearly as hard to achieve it. I want a catgirl volcano lair not "dignity". So when I try to negotiate with my money brain to expend precious calories, the plan had better involve the former, not the latter. I suspect that something similar applies to others.

    I don't want to hear about genre-saviness from the defacto-founder of the community that gave us HPMOR!Harry and the Comet King after he wrote this post. Because it's so antithetical to the attitude present in those characters and posts like this one.

    I also don't want to hear about second-order effects when, as best as I can tell, the att... (read more)

    I think there's an important point about locus of control and scope. You can imagine someone who, early in life, decides that their life's work will be to build a time machine, because the value of doing so is immense (turning an otherwise finite universe into an infinite one, for example). As time goes on, they notice being more and more pessimistic about their prospects of doing that, but have some block against giving up on an emotional level. The stakes are too high for doomerism to be entertained!

    But I think they overestimated their locus of control when making their plans, and they should have updated as evidence came in. If they reduced the scope of their ambitions, they might switch from plans that are crazy because they have to condition on time travel being possible to plans that are sane (because they can condition on actual reality). Maybe they just invent flying cars instead of time travel, or whatever.

    I see this post as saying: "look, people interested in futurism: if you want to live in reality, this is where the battle line actually is. Fight your battles there, don't send bombing runs behind miles of anti-air defenses and wonder why you don't seem to be getting any... (read more)

    Aside: Regardless of whether the quoted claim is true, it does not seem like a prototypical lie. My read of your meaning is: "If you [the hypothetical person claiming alignment is easy] were an honest reasoner and worked out the consequences of what you know, you would not believe that alignment is easy; thusly has an inner deception blossomed into an outer deception; thus I call your claim a 'lie.'"  And under that understanding of what you mean, Vaniver, I think yours is not a wholly inappropriate usage, but rather unconventional. In its unconventionality, I think it implies untruths about the intentions of the claimants. (Namely, that they semi-consciously seek to benefit by spreading a claim they know to be false on some level.) In your shoes, I think I would have just called it an "untruth" or "false claim."  Edit: I now think you might have been talking about EY's hypothetical questioners who thought it valuable to purposefully deceive about the problem's difficulty, and not about the typical present-day person who believes alignment is easy?
    That is what I was responding to.

    "To win any battle, you must fight as if you are already dead.” — Miyamoto Musashi.

    I don't in fact personally know we won't make it. This may be because I'm more ignorant than Eliezer, or may be because he (or his April first identity, I guess) is overconfident on a model, relative to me; it's hard to tell.

    Regardless, the bit about "don't get psychologically stuck having-to-(believe/pretend)-it's-gonna-work seems really sane and healthy to me. Like falling out of an illusion and noticing your feet on the ground. The ground is a more fun and joyful place to live, even when things are higher probability of death than one is used to acting-as-though, in my limited experience. More access to creativity near the ground, I think.

    But, yes, I can picture things under the heading "ineffective doomerism" that seem to me like they suck. Like, still trying to live in an ego-constructed illusion of deferral, and this time with "and we die" pasted on it, instead of "and we definitely live via such-and-such a plan."

    I think I have more access to all of my emotional range nearer the ground, but this sentence doesn't ring true to me.

    The ground is a more fun and joyful place to live, even when things are higher probability of death than one is used to acting-as-though, in my limited experience. 

    Hm. It rings true to me, but there have been periods of my life where it has been false. 

    As cheesy as it is, this is the correct response. I'm a little disappointed that Eliezer would resort to doomposting like this, but at the same time it's to be expected from him after some point. The people with remaining energy need to understand his words are also serving a personal therapeautic purpose and press on.

    Some people can think there's next to no chance and yet go out swinging. I plan to, if I reach the point of feeling hopeless.

    Yeah -- I love AI_WAIFU's comment, but I love the OP too.

    To some extent I think these are just different strategies that will work better for different people; both have failure modes, and Eliezer is trying to guard against the failure modes of 'Fuck That Noise' (e.g., losing sight of reality), while AI_WAIFU is trying to guard against the failure modes of 'Try To Die With More Dignity' (e.g., losing motivation).

    My general recommendation to people would be to try different framings / attitudes out and use the ones that empirically work for them personally, rather than trying to have the same lens as everyone else. I'm generally a skeptic of advice, because I think people vary a lot; so I endorse the meta-advice that you should be very picky about which advice you accept, and keep in mind that you're the world's leading expert on yourself. (Or at least, you're in the best position to be that thing.)

    Cf. 'Detach the Grim-o-Meter' versus 'Try to Feel the Emotions that Match Reality'. Both are good advice in some contexts, for some people; but I think there's some risk from taking either strategy too far, especially if you aren't aware of the other strategy as a viable option.

    Please correct me if I am wrong, but a huge difference between Eliezer's post and AI_WAIFU's comment is that Eliezer's post is informed by conversations with dozens of people about the problem.

    I interpreted AI_WAIFU as pushing back against a psychological claim ('X is the best attitude for mental clarity, motivation, etc.'), not as pushing back against a AI-related claim like P(doom). Are you interpreting them as disagreeing about P(doom)? (If not, then I don't understand your comment.)

    If (counterfactually) they had been arguing about P(doom), I'd say: I don't know AI_WAIFU's level of background. I have a very high opinion of Eliezer's thinking about AI (though keep in mind that I'm a co-worker of his), but EY is still some guy who can be wrong about things, and I'm interested to hear counter-arguments against things like P(doom). AGI forecasting and alignment are messy, pre-paradigmatic fields, so I think it's easier for field founders and authorities to get stuff wrong than it would be in a normal scientific field.

    The specific claim that Eliezer's P(doom) is "informed by conversations with dozens of people about the problem" (if that's what you were claiming) seems off to me. Like, it may be technically true under some interpretation, but (a) I think of Eliezer's views as primarily based on his own models, (b) I'd tentatively guess those models are much more based on things like 'reading textbooks' and 'thinking things through himself' than on 'insights gleaned during back-and-forth discussions with other people', and (c) most people working full-time on AI alignment have far lower P(doom) than Eliezer.

    Sorry for the lack of clarity. I share Eliezer's pessimism about the global situation (caused by rapid progress in AI). All I meant is that I see signs in his writings that over the last 15 years Eliezer has spent many hours trying to help at least a dozen different people become effective at trying to improve the horrible situation we are currently in. That work experience makes me pay much greater attention to him on the subject at hand than someone I know nothing about.

    Ah, I see. I think Eliezer has lots of relevant experience and good insights, but I still wouldn't currently recommend the 'Death with Dignity' framing to everyone doing good longtermist work, because I just expect different people's minds to work very differently.

    Assuming this is correct (certainly it is of Eliezer, though I don’t know AI_WAIFU’s background and perhaps they have had similar conversations), does it matter? WAIFU’s point is that we should continue trying as a matter of our terminal values; that’s not something that can be wrong due to the problem being difficult.
    I agree, but do not perceive Eliezer as having stopped trying or as advising others to stop trying, er, except of course for the last section of this post ("Q6: . . . All of this is just an April Fool’s joke, right?") but that is IMHO addressed to a small fraction of his audience.
    I don't want to speak for him (especially when he's free to clarify himself far better than we could do for him!), but dying with dignity conveys an attitude that might be incompatible with actually winning.  Maybe not; sometimes abandoning the constraint that you have to see a path to victory makes it easier to do the best you can.  But it feels concerning on an instinctive level.  
    In my experience, most people cannot.

    I think both emotions are helpful at motivating me.

    I think I'm more motivated by the thought that I am going to die soon, any children I might have in the future will die soon, my family, my friends, and their children are going to die soon, and any QALYs I think I'm buying are around 40% as valuable as I thought, more than undoing the income tax deduction I get for them. It seems like wrangling my ADHD brain into looking for way to prevent catastrophe could be more worthwhile than working a high-paid job I can currently hyper-focus on (and probably more virtuous, too), unless I find that the probability of success is literally 0% despite what I think I know about Bayesian reasoning, in which case I'll probably go into art or something.

    Agreed. Also here’s the poem that goes with that comment:

    Do not go gentle into that good night,
    Old age should burn and rave at close of day;
    Rage, rage against the dying of the light.

    Though wise men at their end know dark is right,
    Because their words had forked no lightning they
    Do not go gentle into that good night.

    Good men, the last wave by, crying how bright
    Their frail deeds might have danced in a green bay,
    Rage, rage against the dying of the light.

    Wild men who caught and sang the sun in flight,
    And learn, too late, they grieved it on its way,
    Do not go gentle into that good night.

    Grave men, near death, who see with blinding sight
    Blind eyes could blaze like meteors and be gay,
    Rage, rage against the dying of the light.

    And you, my father, there on the sad height,
    Curse, bless, me now with your fierce tears, I pray.
    Do not go gentle into that good night.
    Rage, rage against the dying of the light.

    I totally empathize with Eliezer, and I’m afraid that I might be similarly burned out if I had been trying this for as long.

    But that’s not who I want to be. I want to be Harry who builds a rocket to escape Azkaban, the little girl that faces the meteor with a baseball bat, and the... (read more)

    1David James
    If we know a meteor is about to hit earth, having only D days to prepare, what is rational for person P? Depending on P and D, any of the following might be rational: throw an end of the world party, prep to live underground, shoot ICBMs at the meteor, etc.

    Makes me think of the following quote. I'm not sure how much I agree with or endorse it, but it's something to think about.

    The good fight has its own values. That it must end in irrevocable defeat is irrelevant.

    — Isaac Asimov, It's Been A Good Life

    Exquisitely based

    3Said Achmiz
    Harry Potter and the Comet King have access to magic; we don’t. … is the obvious response, but the correct response is actually: Harry Potter and the Comet King don’t exist, so what attitude is present in those characters is irrelevant to the question of what attitude we, in reality, ought to have.

    Most fictional characters are optimised to make for entertaining stories, hence why "generalizing from fictional evidence" is usually a failure-mode. The HPMOR Harry and the Comet King were optimized by two rationalists as examples of rationalist heroes — and are active in allegorical situations engineered to say something that rationalists would find to be “of worth” about real world problems. 

    They are appealing precisely because they encode assumptions about what a real-world, rationalist “hero” ought to be like. Or at least, that's the hope. So, they can be pointed to as “theses” about the real world by Yudkowsky and Alexander, no different from blog posts that happen to be written as allegorical stories, and if people found the ideas encoded in those characters more convincing than the ideas encoded in the present April Fools' Day post, that's fair enough. 

    Not necessarily correct on the object-level, but, if it's wrong, it's a different kind of error from garden-variety “generalizing from fictional evidence”.

    -1Rob Bensinger

    As fictional characters popular among humans, what attitude is present in them is evidence for what sort of attitude humans like to see or inhabit. As author of those characters, Yudkowsky should be aware of this mechanism. And empirically, people with accurate beliefs and positive attitudes outperform people with accurate beliefs and negative attitudes. It seems plausible Yudkowsky is aware of this as well.

    "Death with dignity" reads as an unnecessarily negative attitude to accompany the near-certainty of doom. Heroism, maximum probability of catgirls, or even just raw log-odds-of-survival seem like they would be more motivating than dignity without sacrificing accuracy.

    Like, just substitute all instances of 'dignity' in the OP with 'heroism' and naively I would expect this post to have a better impact(/be more dignified/be more heroic), except insofar it might give a less accurate impression of Yudkowsky's mood. But few people have actually engaged with him on that front.


    Seeing this post get so strongly upvoted makes me feel like I'm going crazy.

    This is not the kind of content I want on LessWrong. I did not enjoy it, I do not think it will lead me to be happier or more productive toward reducing x-risk, I don't see how it would help others, and it honestly doesn't even seem like a particularly well done version of itself.

    Can people help me understand why they upvoted?


    For whatever it is worth, this post along with reading the unworkable alignment strategy on the ELK report has made me realize that we actually have no idea what to do and has finally convinced me to try to solve alignment, I encourage everyone else to do the same. For some people knowing that the world is doomed by default and that we can't just expect the experts to save it is motivating. If that was his goal, he achieved it.


    Certainly for some people (including you!), yes. For others, I expect this post to be strongly demotivating. That doesn’t mean it shouldn’t have been written (I value honestly conveying personal beliefs and are expressing diversity of opinion enough to outweigh the downsides), but we should realistically expect this post to cause psychological harm for some people, and could also potentially make interaction and PR with those who don’t share Yudkowsky’s views harder. Despite some claims to the contrary, I believe (through personal experience in PR) that expressing radical honesty is not strongly valued outside the rationalist community, and that interaction with non-rationalists can be extremely important, even to potentially world-saving levels. Yudkowsky, for all of his incredible talent, is frankly terrible at PR (at least historically), and may not be giving proper weight to its value as a world-saving tool. I’m still thinking through the details of Yudkowsky’s claims, but expect me to write a post here in the near future giving my perspective in more detail.

    I don't think "Eliezer is terrible at PR" is a very accurate representation of historical fact. It might be a good representation of something else. But it seems to me that deleting Eliezer from the timeline would probably result in a world where far far fewer people were convinced of the problem. Admittedly, such questions are difficult to judge.

    I think "Eliezer is bad at PR" rings true in the sense that he belongs in the cluster of "bad at PR"; you'll make more correct inferences about Eliezer if you cluster him that way. But on historical grounds, he seems good at PR.


    Eliezer is "bad at PR" in the sense that there are lots of people who don't like him. But that's mostly irrelevant. The people who do like him like him enough to donate to his foundation and all of the foundations he inspired.

    It’s the people who don’t like him (and are also intelligent and in positions of power), which I’m concerned with in this context. We’re dealing with problems where even a small adversarial group can do a potentially world-ending amount of harm, and that’s pretty important to be able to handle!

    My personal experience is that the people who actively dislike Eliezer are specifically the people who were already set on their path; they dislike Eliezer mostly because he's telling them to get off that path.

    I could  be wrong, however; my personal experience is undoubtedly very biased.


    I’ll tell you that one of my brothers (who I greatly respect) has decided not to be concerned about AGI risks specifically because he views EY as being a very respected “alarmist” in the field (which is basically correct), and also views EY as giving off extremely “culty” and “obviously wrong” vibes (with Roko’s Basilisk and EY’s privacy around the AI boxing results being the main examples given), leading him to conclude that it’s simply not worth engaging with the community (and their arguments) in the first place. I wouldn’t personally engage with what I believe to be a doomsday cult (even if they claim that the risk of ignoring them is astronomically high), so I really can’t blame him.

    I’m also aware of an individual who has enormous cultural influence, and was interested in rationalism, but heard from an unnamed researcher at Google that the rationalist movement is associated with the alt-right, so they didn’t bother looking further. (Yes, that’s an incorrect statement, but came from the widespread [possibly correct?] belief that Peter Theil is both alt-right and has/had close ties with many prominent rationalists.) This indicates a general lack of control of the narrative surrounding the movement, and likely has directly led to needlessly antagonistic relationships.


    That's putting it mildly.

    The problems are well known. The mystery is why the community doesn't implement obvious solutions. Hiring PR people is an obvious solution. There's a posting somewhere in which Anna Salamon argues that there is some sort of moral hazard involved in professional PR, but never explains why, and everyone agrees with her anyway.

    If the community really and literally is about saving the world, then having a constant stream of people who are put off, or even becoming enemies is incrementally making the world more likely to be destroyed. So surely it's an important problem to solve? Yet the community doesn't even like discussing it. It's as if maintaining some sort of purity, or some sort of impression that you don't make mistakes is more important than saving the world.

    Presumably you mean this post.

    If the community really and literally is about saving the world, then having a constant stream of people who are put off, or even becoming enemies is incrementally making the world more likely to be destroyed. So surely it's an important problem to solve? Yet the community doesn't even like discussing it. It's as if maintaining some sort of purity, or some sort of impression that you don't make mistakes is more important than saving the world.

    I think there are two issues.

    First, some of the 'necessary to save the world' things might make enemies. If it's the case that Bob really wants there to be a giant explosion, and you think giant explosions might kill everyone, you and Bob are going to disagree about what to do, and Bob existing in the same information environment as you will constrain your ability to share your preferences and collect allies without making Bob an enemy.

    Second, this isn't an issue where we can stop thinking, and thus we need to continue doing things that help us think, even if those things have costs. In contrast, in a situation where you know what plan you need to implement, you can now drop lots of your ability to think in order ... (read more)

    So...that's a metaphor for "telling people who like building AIs to stop building AIs pisses them off and turns them into enemies". Which it might, but how often does that happen? Your prominent enemies aren't in that category , as far as I can see. David Gerard,for instance, was alienated by a race/IQ discussion. So good PR might consist of banning race/IQ. Also, consider the possibility that people who know how to build AIs know more than you, so it's less a question of their being enemies , and more one of their being people you can learn from.
    I don't know how public various details are, but my impression is that this was a decent description of the EY - Dario Amodei relationship (and presumably still is?), tho I think personality clashes are also a part of that. I mean, obviously they know more about some things and less about others? Like, virologists doing gain of function research are also people who know more than me, and I could view them as people I could learn from. Would that advance or hinder my goals?
    If you are under some kind of misapprehension about the nature of their work, it would help. And you don't know that you are not under a misapprehension, because they are the experts, not you. So you need to talk to them anyway. You might believe that you understand the field flawlessly, but you dont know until someone checks your work.
    It is not enough to say nice things: other representatives must be prevented from saying nasty things.
    For any statement one can make, there will be people "alienated" (=offended?) by it.    David Gerard was alienated by a race/IQ discussion and you think that should've been avoided.  But someone was surely equally alienated by discussions of religion, evolution, economics, education and our ability to usefully define words.    Do we value David Gerard so far above any given creationist, that we should hire a PR department to cater to him and people like him specifically?    There is an ongoing effort to avoid overtly political topics (Politics is the mind-killer!) - but this effort is doomed beyond a certain threshold, since everything is political to some extent. Or to some people.   To me, a concerted PR effort on part of all prominent representatives to never say anything "nasty" would be alienating. I don't think a community even somewhat dedicated to "radical" honesty could abide a PR department - or vice versa.         TL;DR - LessWrong has no PR department, LessWrong needs no PR department!
    If you also assume that nothing available except of perfection, that's a fully general argument against PR, not just against the possibility of LW/MIRI having good PR. If you don't assume that, LW/MIRI can have good PR, by avoiding just the most significant bad PR. Disliking racism isn't some weird idiosyncratic thing that only Gerard has.
    The level of PR you aim for puts an upper limit to how much "radical" honesty you can have. If you aim for perfect PR, you can have 0 honesty. If you aim for perfect honesty, you can have no PR. lesswrong doesn't go that far, by a long shot - even without a PR team present.   Most organization do not aim for honesty at all.   The question is where do we draw the line.   Which brings us to "Disliking racism isn't some weird idiosyncratic thing that only Gerard has."    From what I understand, Gerard left because he doesn't like discussions about race/IQ.  Which is not the same thing as racism.   I, personally, don't want lesswrong to cater to people who can not tolerate a discussion.
    honesty=/=frankness. Good PR does not require you to lie.
    Semantics.  Good PR requires you to put a filter between what you think is true and what you say.
    It requires you to filter what you publicly and officially say. "You", plural, the collective, can speak as freely as you like ...in private. But if you, individually, want to be able to say anything you like to anyone, you had better accept the consequences.
    "The mystery is why the community doesn't implement obvious solutions. Hiring PR people is an obvious solution. There's a posting somewhere in which Anna Salamon argues that there is some sort of moral hazard involved in professional PR, but never explains why, and everyone agrees with her anyway."   ""You", plural, the collective, can speak as freely as you like ...in private."   Suppose a large part of the community wants to speak as freely as it likes in public, and the mystery is solved. We even managed to touch upon the moral hazard involved in professional PR - insofar as it is a filter between what you believe and what you say publicly.
    Theres a hazard in having no filters, as well. One thing being bad doesn't make another good.
    None of these seem to reflect on EY unless you would expect him to be able to predict that a journalist would write an incoherent almost maximally inaccurate description of an event where he criticized an idea for being implausible then banned its discussion for being off-topic/pointlessly disruptive to something like two people or that his clearly written rationale for not releasing the transcripts for the ai box experiments would be interpreted as a recruiting tool for the only cult that requires no contributions to be a part of, doesn't promise its members salvation/supernatural powers, has no formal hierarchy and is based on a central part of economics.
    I would not expect EY to have predicted that himself, given his background. If, however, he either had studied PR deeply or had consulted with a domain expert before posting, then I would have totally expected that result to be predicted with some significant likelihood. Remember, optimally good rationalists should win, and be able to anticipate social dynamics. In this case EY fell into a social trap he didn’t even know existed, so again, I do not blame him personally, but that does not negate the fact that he’s historically not been very good at anticipating that sort of thing, due to lack of training/experience/intuition in that field. I’m fairly confident that at least regarding the Roko’s Basilisk disaster, I would have been able to predict something close to what actually happened if I had seen his comment before he posted it. (This would have been primarily due to pattern matching between the post and known instances of the Striezand Effect, as well as some amount of hard-to-formally-explain intuition that EY’s wording would invoke strong negative emotions in some groups, even if he hadn’t taken any action. Studying “ratio’d” tweets can help give you a sense for this, if you want to practice that admittedly very niche skill). I’m not saying this to imply that I’m a better rationalist than EY (I’m not), merely to say that EY—and the rationalist movement generally—hasn’t focused on honing the skillset necessary to excel at PR, which has sometimes been to our collective detriment.
    The question is whether people who prioritize social-position/status-based arguments over actual reality were going to contribute anything meaningful to begin with. The rationalist community has been built on, among other things, the recognition that human species is systematically broken when it comes to epistemic rationality. Why think that someone who fails this deeply wouldn't continue failing at epistemic rationality at every step even once they've already joined?
    I think making the assumption that anyone who isn't in our community is failing to think rationally is itself not great epistemics. It's not irrational at all to refrain from engaging with the ideas of a community you believe to be vaguely insane. After all, I suspect you haven't looked all that deeply into the accuracy of the views of the Church of Scientology, and that's not a failure on your part, since there's little chance you'll gain much of value for your time if you did. There are many, many, many groups out there who sound intelligent at first glance, but when seriously engaged with fall apart. Likewise, there are those groups which sound insane at first, but actually have deep truths to teach (I'd place some forms of Zen Buddhism under this category). It makes a lot of sense to trust your intuition on this sort of thing, if you don't want to get sucked into cults or time-sinks.
    I didn't talk about "anyone who isn't in our community," but about   It's epistemically irrational if I'm implying the ideas are false and if this judgment isn't born from interacting with the ideas themselves but with

    Eliezer is extremely skilled at capturing attention. One of the best I've seen, outside of presidents and some VCs.
    However, as far as I've seen, he's terrible at getting people to do what he wants. 
    Which means that he has a tendency to attract people to a topic he thinks is important but they never do what he thinks should be done- which seems to lead to a feeling of despondence. 
    This is where he really differs from those VCs and presidents- they're usually far more balanced.

    For an example of an absolute genius in getting people to do what he wants, see Sam Altman.

    You make a strong point, and as such I’ll emend my statement a bit—Eliezer is great at PR aimed at a certain audience in a certain context, which is not universal. Outside of that audience, he is not great at Public Relations(™) in the sense of minimizing the risk of gaining a bad reputation. Historically, I am mostly referring to Eliezer’s tendency to react to what he’s believed to be infohazards in such a way that what he tried to suppress was spread vastly beyond the counterfactual world in which Eliezer hadn’t reacted at all. You only need to slip up once when it comes to risking all PR gains (just ask the countless politicians destroyed by a single video or picture), and Eliezer has slipped up multiple times in the past (not that I personally blame him; it’s a tremendously difficult skillset which I doubt he’s had the time to really work on). All of this is to say that yes, he’s great at making powerful, effective arguments, which convince many rationalist-leaning people. That is not, however, what it means to be a PR expert, and is only one small aspect of a much larger domain which rationalists have historically under-invested in.
    Sounds about right!

    I very much had the same experience, making me decide to somewhat radically re-orient my life.

    What part of the ELK report are you saying felt unworkable?
    ELK itself seems like a potentially important problem to solve, the part that didn't make much sense to me was what they plan to do with the solution, their idea based on recursive delegation.
    Ok, that's a very reasonable answer.
    Awesome. What are your plans? Have you considered booking a call with AI Safety Support, registering your interest for the next AGI Safety Fundamentals Course or applying to talk to 80,000 hours?
    I will probably spend 4 days (from the 14th to the 17th, I’m somewhat busy until then) thinking about alignment to see whether there is any chance I might be able to make progress. I have read what is recommended as a starting point on the alignment forum, and can read the AGI Safety Fundamentals Course’s curriculum on my own. I will probably start by thinking about how to formalize (and compute) something similar to what we call human values, since that seems to be the core of the problem, and then turning that into something that can be evaluated over possible trajectories of the AI’s world model (or over something like reasoning chains or whatever, I don’t know). I hadn’t considered that as a career, I live in Europe and we don’t have that kind of organizations here, so it will probably just be a hobby.
    Sounds like a great plan! Even if you end up deciding that you can't make research progress (not that you should give up after just 4 days!), I can suggest a bunch of other activities that might plausibly contribute towards this. I expect that this will change within the next year or so (for example, there are plans for a Longtermist Hotel in Berlin and I think it's very likely to happen).
    What other activities?
    Here's a few off the top of my mind: • Applying to facilitate the next rounds of the AGI Safety Fundamentals course (apparently they compensated facilitators this time) • Contributing to Stampy Wiki • AI Safety Movement Building - this can be as simple as hosting dinners with two or three people who are also interested • General EA/rationalist community building • Trying to improve online outreach. Take for example the AI Safety Discussion (Open) fb group. They could probably be making better use of the sidebar.  The moderator might be open to updating it if someone reached out to them and offered to put in the work. It might be worth seeing what other groups are out there too. Let me know if none of these sound interesting and I could try to think up some more.
    Same this post is what made me decide I can't leave it to the experts. It is just a matter of spending the required time to catch up on what we know and tried. As Keltham said - Diversity is in itself an asset. If we can get enough humans to think about this problem we can get some breakthroughs many some angles others have not thought of yet.   For me, it was not demotivating. He is not a god, and it ain't over until the fat lady sings. Things are serious and it just means we should all try our best. In fact, I am kinda happy to imagine we might see a utopia happen in my lifetime. Most humans don't get a chance to literally save the world. It would be really sad if I died a few years before some AGI turned into a superintelligence.

    I primarily upvoted it because I like the push to 'just candidly talk about your models of stuff':

    I think we die with slightly more dignity - come closer to surviving, as we die - if we are allowed to talk about these matters plainly.  Even given that people may then do unhelpful things, after being driven mad by overhearing sane conversations.  I think we die with more dignity that way, than if we go down silent and frozen and never talking about our impending death for fear of being overheard by people less sane than ourselves.

    I think that in the last surviving possible worlds with any significant shred of subjective probability, people survived in part because they talked about it; even if that meant other people, the story's antagonists, might possibly hypothetically panic.

    Also because I think Eliezer's framing will be helpful for a bunch of people working on x-risk. Possibly a minority of people, but not a tiny minority. Per my reply to AI_WAIFU, I think there are lots of people who make the two specific mistakes Eliezer is warning about in this post ('making a habit of strategically saying falsehoods' and/or 'making a habit of adopting optimistic assumptions on the ... (read more)

    Given how long it took me to conclude whether these were Eliezer's true thoughts or a representation of his predicted thoughts in a somewhat probable future, I'm not sure whether I'd use the label "candid" to describe the post, at least without qualification.

    While the post does contain a genuinely useful way of framing near-hopeless situations and a nuanced and relatively terse lesson in practical ethics, I would describe the post as an extremely next-level play in terms of its broader purpose (and leave it at that).

    I... upvoted it because it says true and useful things about how to make the world not end and proposes an actionable strategy for how to increase our odds of survival while relatively thoroughly addressing a good number of possible objections. The goal of LessWrong is not to make people happier, and the post outlines a pretty clear hypothesis about how it might help others (1. by making people stop working on plans that condition on lots of success in a way that gets ungrounded from reality, 2. by making people not do really dumb unethical things out of desperation).


    Additionally, the OP seems to me good for communication: Eliezer had a lot of bottled up thoughts, and here put them out in the world, where his thoughts can bump into other people who can in turn bump back into him.

    AFAICT, conversation (free, open, "non-consequentialist" conversation, following interests and what seems worth sharing rather than solely backchaining from goals) is one of the places where consciousness and sanity sometimes enter. It's right there next to "free individual thought" in my list of beautiful things that are worth engaging in and safeguarding.

    I upvoted it because I think it's true and I think that this is a scenario where 'epistemic rationality' concerns trump 'instrumental rationality' concerns.

    Agreed with regards to “epistemic rationality” being more important at times than “instrumental rationality.” That being said, I don’t think that concerns about the latter are unfounded.

    I upvoted it because I wish I could give Eliezer a hug that actually helps make things better, and no such hug exists but the upvote button is right there.

    I strong-upvoted this post because I read a private draft by Eliezer which is a list of arguments why we're doomed. The private draft is so informative that, if people around me hadn't also read and discussed it, I would have paid several months of my life to read it. It may or may not be published eventually. This post, being a rant, is less useful, but it's what we have for now. It's so opaque and confusing that I'm not even sure if it's net good, but if it's 5% as good as the private document it still far surpasses my threshold for a strong upvote.

    EDIT: it may or may not be published eventually

    5Oliver Sourbut
    I assume that this post (List of Lethalities) is the public version of what become of that doc
    2Thomas Kwa
    Oooh, that sounds great! Can someone send me a copy so I can perv out on how doomed we are? Who knows, my natural contrarian instincts might fire and I might start to think of nitpicks and counterarguments.  But at the very least, I will enjoy it loads, and that's something?
    0Alex K. Chen (parrot)
    Yeah send me a copy too simfish@gmail.com
    1Fauna M.
    If you're still offering to share, i would like to read it faunam@gmail.com

    Upvoted because it's important to me to know what EY thinks the mainline-probability scenario looks like and what are the implications.

    If that's what he and MIRI think is the mainline scenario, then that's what I think is the mainline scenario, because their quality of reasoning and depth of insight seems very high whenever I have an opportunity to examine it.

    Personally, I am not here (or most other places) to "enjoy myself" or "be happier". Behind the fool's licence of April 1, the article seems to me to be saying true and important things. If I had any ideas about how to solve the AGI problem that would pass my shoulder Eliezer test, I would be doing them all the more. However, lacking such ideas, I only cultivate my garden.

    Have you considered registering for the next round of the AGI Safety Fundamentals course, booking a call with AI Safety Support or talking to 80,000 Hours?
    No, not at all. I have no ideas in this field, and what's more, I incline to Eliezer's pessimism, as seen in the recently posted dialogues, about much of what is done.
    I'd still encourage you to consider projects at a meta-level up such as movement-building or earn-to-give. But also totally understand if you consider the probabilities of success too low to really bother about.

    I have a weird bias towards truth regardless of consequences, and upvoted out of emotional reflex. Also I love Eliezer's writing and it is a great comfort to me to have something fun to read on the way to the abyss.

    I disagree with Eliezer about half the time, including about very fundamental things, but I strongly upvoted the post, because that attitude gives both the best chance of success conditional on the correct evaluation of the problem, and it does not kill you if the evaluation is incorrect and the x-risk in question is an error in the model. It is basically a Max EV calculation for most reasonable probability distributions.

    I upvoted the post despite disagreeing with it (I believe the success probability is ~ 30%). Because, it seems important for people to openly share their beliefs in order to maximize our collective ability to converge on the truth. And, I do get some potentially valuable information from the fact that this is what Yudkowsky beliefs (even while disagreeing).

    Hi, I'm always fascinated by people with success probabilities that aren't either very low or 'it'll probably be fine'.

    I have this collection of intuitions (no more than that): 

    (1) 'Some fool is going to build a mind', 

    (2) 'That mind is either going to become a god or leave the fools in position to try again, repeat', 

    (3) 'That god will then do whatever it wants'.

    It doesn't seem terribly relevant these days, but there's another strand that says: 

    (4) 'we have no idea how to build minds that want specific things' and 

    (5) 'Even if we knew how to build a mind that wanted a specific thing, we have no idea what would be a good thing' .

    These intuitions don't leave me much room for optimism, except in the sense that I might be hopelessly wrong and, in that case, I know nothing and I'll default back to 'it'll probably be fine'.

    Presumably you're disagreeing with one of (1), (2), or (3) and one of (4) or (5). 

    Which ones and where does the 30% from?

    I believe that we might solve alignment in time and aligned AI will protect us from unaligned AI. I'm not sure how to translate it to your 1-3 (the "god" will do whatever it wants, but it will want what we want so there's no problem). In terms of 4-5, I guess I disagree with both or rather disagree that this state of ignorance will necessarily persist.

    Neat, so in my terms you think we can pull off 4 and 5 and get it all solid enough to set running before anyone else does 123? 4 and 5 have always looked like the really hard bits to me, and not the sort of thing that neural networks would necessarily be good at, so good luck! But please be careful to avoid fates-worse-than-death by getting it almost right but not quite right. I'm reasonably well reconciled with death, but I would still like to avoid doing worse if possible. 
    My initial reaction to the post was almost as negative as yours. I've partly changed my mind, due to this steelman of Eliezer's key point by Connor Leahy.
    I thought it was funny. And a bit motivational. We might be doomed, but one should still carry on. If your actions have at least a slight chance to improve matters, you should do it, even if the odds are overwhelmingly against you. Not a part of my reasoning, but I'm thinking that we might become better at tackling the issue if we have a real sense of urgency - which this and A list of lethalities provide.
    -11Lone Pine

    Thank you for trying. 

    Shouldn't someone (some organization) be putting a lot of effort and resources into this strategy (quoted below) in the hope that AI timelines are still long enough for the strategy to work? With enough resources, it should buy at least a few percentage of non-doom probability (even now)?

    Given that there are known ways to significantly increase the number of geniuses (i.e., von Neumann level, or IQ 180 and greater), by cloning or embryo selection, an obvious alternative Singularity strategy is to invest directly or indirectly in these technologies, and to try to mitigate existential risks (for example by attempting to delay all significant AI efforts) until they mature and bear fruit (in the form of adult genius-level FAI researchers).

    Sure, why not.  Sounds dignified to me.

    For starters, why aren't we already offering the most basic version of this strategy as a workplace health benefit within the rationality / EA community?  For example, on their workplace benefits page, OpenPhil says:

    We offer a family forming benefit that supports employees and their partners with expenses related to family forming, such as fertility treatment, surrogacy, or adoption. This benefit is available to all eligible employees, regardless of age, sex, sexual orientation, or gender identity.

    Seems a small step from there to making "we cover IVF for anyone who wants (even if your fertility is fine) + LifeView polygenic scores" into a standard part of the alignment-research-agency benefits package.  Of course, LifeView only offers health scores, but they will also give you the raw genetic data.  Processing this genetic data yourself, DIY style, could be made easier -- maybe there could be a blog post describing how to use an open-source piece of software and where to find the latest version of EA3, and so forth.

    All this might be a lot of trouble for (if you are pessimistic about PGT's potential) a rather small benefit.  We are not talking Von Neumanns here. ... (read more)

    You get the same intelligence gain in a way I find considerably less dubious by lifting poor kids out of poverty, giving them access to decent nutrition and safety, and education. Cheaper, too. Also more just. And more diverse.
    I don't see any realistic world where you both manage to get government permission to allow you to genetically engineer children for intelligence and they let you specifically raise them to do safety work far enough in advance that they actually have time to contribute and in a way that outweighs any PR risk.

    Embryo selection for intelligence does not require government permission to do. You can do it right now. You only need the models and the DNA. I've been planning on releasing a website that allows people to upload genetic data they get from LifeView for months, but I haven't gotten around to finishing it for the same reason I think that others aren't.

    Part of me wants to not post this just because I want to be the first to make the website, but that seems immoral, so, here. 

    Interesting. I had no idea.

    Both cloning and embryo selection are not illegal in many places, including the US. (This article suggests that for cloning you may have to satisfy the FDA's safety concerns, which perhaps ought to be possible for a well-resourced organization.) And you don't have to raise them specifically for AI safety work. I would probably announce that they will be given well-rounded educations that will help them solve whatever problems that humanity may face in the future.

    0Lucius Bushnaq
    Sounds good to me! Anyone up for making this an EA startup? Having more Neumann level geniuses around seems like an extremely high impact intervention for most things, not even just singularity related ones.  As for tractability, I can't say anything about how hard this would be to get past regulators, or how much engineering work is missing for making human cloning market ready, but finding participants seems pretty doable? I'm not sure yet whether I want children, but if I decide I do, I'd totally parent a Neumann clone. If this would require moving to some country where cloning isn't banned, I might do that as well. I bet lots of other EAs would too. 

    The first thing I can remember is that I learned at age 3 that I would die someday, and I cried about it. I got my hopes up about radical technological progress (including AGI and biotech) extending lifespan as a teenager, and I lost most of that hope (and cried the most I ever had in my life) upon realizing that AGI probably wouldn't save us during our lifetime, alignment was too hard.

    In some sense this outcome isn't worse than what I thought was fated at age 3, though. I mean, if AGI comes too soon, then I and my children (if I have them) won't have the 70-80 year lifetimes I expected, which would be disappointing; I don't think AGI is particularly likely to be developed before my children die, however (minority opinion around here, I know). There's still some significant chance of radical life extension and cognitive augmentation from biotech assisted by narrow AI (if AGI is sufficiently hard, which I think it is, though I'm not confident). And as I expressed in another comment, there would be positive things about being replaced by a computationally massive superintelligence solving intellectual problems beyond my comprehension; I think that would comfort me if I were in th... (read more)

    Oh? Do say more

    Mostly scalable blockchain systems at this point, I have some writing on the problem hosted at gigascaling.net.

    What paradigm shift are you trying to create in philosophy?

    The sort of thing I write about on my blog. Examples:

    • Attention to "concept teaching" as a form of "concept definition", using cognitive science models of concept learning
    • "What is an analogy/metaphor" and how those apply to "foundations" like materialism
    • Reconciling "view from nowhere" with "view from somewhere", yielding subject-centered interpretations of physics and interpretations of consciousness as relating to local knowledge and orientation
    • Interpreting "metaphysics" as about local orientation of representation, observation, action, etc, yielding computer-sciencey interpretations of apparently-irrational metaphysical discourse ("qualia are a poor man's metaphysics")
    Sounds interesting. Hopefully, I come back and read some of those links when I have more time.

    Just a reminder to everyone, and mostly to myself: 

    Not flinching away from reality is entirely compatible with not making yourself feel like shit.  You should only try to feel like shit when that helps.


    The anime protagonist just told everyone that there's no hope. I don't have a "don't feel like shit" button. Not flinching away from reality and not feeling like shit are completely incompatible in this scenario given my mental constitution. There are people who can do better, but not me.

    I'm going to go drinking.

    Given that, then yes, feeling like shit plus living-in-reality is your best feasible alternative.

    Curling up into a ball and binge drinking till the eschaton probably is not though: see Q1.

    For what it's worth, I think I prefer the phrase,
    "Failing with style"

    It sounds like Eliezer is confident that alignment will fail. If so, the way out is to make sure AGI isn’t built. I think that’s more realistic than it sounds

    1. LessWrong is influential enough to achieve policy goals

    Right now, the Yann LeCun view of AI is probably more mainstream, but that can change fast.

    LessWrong is upstream of influential thinkers. For example:
    - Zvi and Scott Alexander read LessWrong. Let’s call folks like them Filter #1
    - Tyler Cowen reads Zvi and Scott Alexander. (Filter #2)
    - Malcolm Gladwell, a mainstream influencer, reads Tyler Cowen every morning (Filter #3)

    I could’ve made a similar chain with Ezra Klein or Holden Karnofsky. All these chains put together is a lot of influence

    Right now, I think Eliezer’s argument (AI capabilities research will destroy the world) is blocked at Filter #1. None of the Filter #1 authors have endorsed it. Why should they? The argument relies on intuition. There’s no way for Filter #1 to evaluate it. I think that’s why Scott Alexander and Holden Karnofsky hedged, neither explicitly endorsing nor rejecting the doom theory.

    Even if they believed Eliezer, Filter #1 authors need to communicate more than an intuition to Filter #2. Imagin... (read more)

    I tend to agree that Eliezer (among others) underestimates the potential value of US federal policy. But on the other hand, note No Fire Alarm, which I mostly disagree with but which has some great points and is good for understanding Eliezer's perspective. Also note (among other reasons) that policy preventing AGI is hard because it needs to stop every potentially feasible AGI project but: (1) defining 'AGI research' in a sufficient manner is hard, especially when (2) at least some companies naturally want to get around such regulations, and (3) at least some governments are likely to believe there is a large strategic advantage to their state 'getting AGI first,' and arms control for software is hard because states wouldn't think they could trust each other and verifying compliance would probably be very invasive so states would be averse to such verification. Eliezer has also written about why he's pessimistic about policy elsewhere, though I don't have a link off the top of my head.

    Eliezer gives alignment a 0% chance of succeeding. I think policy, if tried seriously, has >50%. So it's a giant opportunity that's gotten way too little attention

    I'm optimistic about policy for big companies in particular. They have a lot to lose from breaking the law, they're easy to inspect (because there's so few), and there's lots of precedent (ITAR already covers some software). Right now, serious AI capabilities research just isn't profitable outside of the big tech companies

    Voluntary compliance is also a very real thing. Lots of AI researchers are wealthy and high-status, and they'd have a lot to lose from breaking the law. At the very least, a law would stop them from publishing their research. A field like this also lends itself to undercover enforcement

    I think an agreement with China is impossible now, because prominent folks don't even believe the threat exists. Two factors could change the art of the possible. First, if there were a widely known argument about the dangers of AI, on which most public intellectual agreed. Second, since the US has a technological lead, it could actually be to their advantage.

    Look at gain of function research for the result of a government moratorium on research. At first Baric feared that the moratorium would end his research. Then the NIH declared that his research isn't officially gain of function and continued funding him. 

    Regulating gain of function research away is essentially easy mode compared to AI.

    A real Butlerian jihad would be much harder.

    9Grant Demaree
    I agree that it's hard, but there are all sorts of possible moves (like LessWrong folks choosing to work at this future regulatory agency, or putting massive amounts of lobbying funds into making sure the rules are strict) If the alternative (solving alignment) seems impossible given 30 years and massive amounts of money, then even a really hard policy seems easy by comparison

    How about if you solve a ban on gain-of-function research first, and then move on to much harder problems like AGI?  A victory on this relatively easy case would result in a lot of valuable gained experience, or, alternatively, allow foolish optimists to have their dangerous optimism broken over shorter time horizons.

    foolish optimists to have their dangerous optimism broken

    I’m pretty confused about your confidence in your assertion here. Have you spoken to people who’ve lead successful government policy efforts, to ground this pessimism? Why does the IAEA exist? How did ARPA-E happen? Why is a massive subsidy for geothermal well within the Overton Window and thus in a bill Joe Manchin said he would sign?

    Gain of function research is the remit of a decades-old incumbent bureaucracy (the NIH) that oversees bio policy, and doesn’t like listening to outsiders. There’s no such equivalent for AI; everyone in the government keeps asking “what should we do” and all the experts shrug or disagree with each other. What if they mostly didn’t?

    Where is your imagined inertia/political opposition coming from? Is it literally skepticism that senators show up for work every day? What if I told you that most of them do and that things with low political salience and broad expert agreement happen all the time?

    Where my skepticism is coming from (for AI policy) is: what's the ban, in enough detail that it could actually be a law? Are we going to have an Office of Program Approval, where people have to send code, the government has to read it, and only once the government signs off, it can get run? If so, the whole tech industry will try to bury you, and even if you succeed, how are you going to staff that office with people who can tell the difference between AGI code and non-AGI code? Are we going to have laws about what not to do, plus an office of lawyers looking for people breaking the laws? (This is more the SEC model.) Then this is mostly a ban on doing things in public; the NHTSA only knew to send George Hotz a cease-and-desist because he was uploading videos of the stuff he was doing. Maybe you can get enough visibility into OpenAI and Anthropic, but do you also need to get the UK to create one to get visibility into Deepmind? If the Canadian government, proud of its AI industry and happy to support it, doesn't make such an office, do the companies just move there? [Like, the federal government stopped the NIH from funding stem cell research for moral reasons, and California said "fine, we'll fund it instead."] If the laws are just "don't make AI that will murder people or overthrow the government", well, we already have laws against murdering people and overthrowing the government. The thing I'm worried about is someone running a program that they think will be fine which turns out to not be fine, and it's hard to bridge the gap between anticipated and actual consequences with laws.
    1Not Relevant
    To clarify, I largely agree with the viewpoint that "just announcing a law banning AGI" is incoherent and underspecified. But the job will with high probability be much easier than regulating the entire financial sector (the SEC's job), which can really only be done reactively. If AGI projects cost >$1B and require specific company cultural DNA, it's entirely possible that we're talking about fewer than 20 firms across the Western world. These companies will be direct competitors, and incentivized to both (1) make sure the process isn't too onerous and (2) heavily police competitors in case they try to defect, since that would lead to an unfair advantage. The problem here is preventing overall drift towards unsafe systems, and that is much easier for a central actor like a government to coordinate. Re: Canada and the UK, I'm really not sure why you think those societies would be less prone to policy influence; as far as I can tell they're actually much easier cases. "Bring your business here, we don't believe the majority of the experts [assuming we can get that] that unregulated development is decently likely to spawn a terminator might kill everyone" is actually not a great political slogan, pretty much anywhere.
    I'm interested in the details here! Like, 'easier' in the sense of "requires fewer professionals", "requires fewer rulings by judges", "lower downside risk", "less adversarial optimization pressure", something else? [For context, in my understanding of the analogy between financial regulation and AI, the event in finance analogous to when humans would lose control of the future to AI was probably around the point of John Law.] [EDIT] Also I should note I'm more optimistic about this the more expensive AGI is / the fewer companies can approach it. My guess is that a compute-centric regulatory approach--one where you can't use more than X compute without going to the government office or w/e--has an easier shot of working than one that tries to operate on conceptual boundaries. But we need it to be the case that much compute is actually required, and building alternative approaches to assembling that much compute (like Folding@Home, or secret government supercomputers, or w/e) are taken seriously. Maybe? One of the things that's sort of hazardous about AI (and is similarly hazardous about finance) is that rainbow after rainbow leads to a pot of gold. First AI solves car accidents, then they solve having to put soldiers in dangerous situations, then they solve climate change, then they solve cancer, then--except at some point in there, you accidentally lose control of the future and probably everyone dies. And it's pretty easy for people to dismiss this sort of concern on psychological grounds, like Steven Pinker does in Enlightenment Now. 
    1Not Relevant
    By "easier", I specifically mean "overseeing fewer firms, each taking fewer actions". I wholeheartedly agree that any sort of regulation is predicated on getting lucky re: AGI not requiring <$100M amounts of compute, when it's developed. If as many actors can create/use AGI as can run hedge funds, policy is probably not going to help much. IMO secret government supercomputers will never be regulatable; the only hope there is government self-regulation (by which I mean, getting governments as worried about AGI catastrophes as their leading private-sector counterparts). Folding@Home equivalents are something of an open problem; if there was one major uncertainty, I'd say they're it, but again this is less of a problem the more compute is required. I think that you are absolutely correct that unless e.g. the hard problem of corrigibility gets verified by the scientific community, promulgated to adjacent elites, and popularized with the public, there is little chance that proto-AGI-designers will face pressure to curb their actions. But those actions are not "impossible" in some concrete sense; they just require talent and expertise in mass persuasion, instead of community-building.
    We probably have a ban on gain-of-function research in the bag, since it seems relatively easy to persuade intellectuals of the merits of the idea.  How that then translates to real-world policy is opaque to me, but give it fifty years? Half the crackpot ideas that were popular at college have come true over my lifetime. Our problem with AI is that we can't convince anyone that it's dangerous.  And we may not need the fifty years! Reaching intellectual consensus might be good enough to slow it down until the government gets round to banning it. Weirdly the other day I ran into a most eminent historian and he asked me what I'd been doing lately. As it happened I'd been worrying about AI, and so I gave him the potted version, and straight away he said: "Shouldn't we ban it then?", and I was like: "I think so, but that makes me a crank amongst cranks".  My problem is that I am not capable of convincing computer scientists and mathematicians, who are usually the people who think most like me.  They always start blithering on about consciousness or 'if it's clever enough to ... then why..." etc, and although I can usually answer their immediate objections, they just come up with something else. But even my closest friends have taken a decade to realize that I might be worrying about something real instead of off on one. And I haven't got even a significant minority of them. And I think that's because I don't really understand myself. I have a terrible intuition about powerful optimization processes and that's it. 

    We probably have a ban on gain-of-function research in the bag, since it seems relatively easy to persuade intellectuals of the merits of the idea. 

    Is this the case? Like, we had a moratorium on federal funding (not even on doing it, just whether or not taxpayers would pay for it), and it was controversial, and then we dropped it after 3 years.

    You might have thought that it would be a slam dunk after there was a pandemic for which lab leak was even a plausible origin, but the people who would have been considered most responsible quickly jumped into the public sphere and tried really hard to discredit the idea. I think this is part of a general problem, which is that special interests are very committed to an issue and the public is very uncommitted, and that balance generally favors the special interests. [It's Peter Daszak's life on the line for the lab leak hypothesis, and a minor issue to me.] I suspect that if it ever looks like "getting rid of algorithms" is seriously on the table, lots of people will try really hard to prevent that from becoming policy. 

    And more crucially, it didn't even stop the federal funding of Baric while it was in place. The equivalent would be that you outlaw AGI development but do nothing about people training tool AI's and people simply declaring their development as tool AI development in response to the regulation.
    It's certainly fairly easy to persuade people that it's a good idea, but you might be right that asymmetric lobbying can keep good ideas off the table indefinitely. On the other hand, 'cigarettes cause cancer' to 'smoking bans' took about fifty years despite an obvious asymmetry in favour of tobacco. As I say, politics is all rather opaque to me, but once an idea is universally agreed amongst intellectuals it does seem to eventually result in political action.  
    Given the lack of available moves that are promising, attempting to influence policy is a reasonable move. It's part of the 80,000 hours career suggestions. On the other hand it's a long-short and I see no reason to expect a high likelihood of success. 
    This is exactly what we have piloted at the Existential Risk Observatory, a Dutch nonprofit founded last year. I'd say we're fairly successful so far. Our aim is to reduce human extinction risk (especially from AGI) by informing the public debate. Concretely, what we've done in the past year in the Netherlands is (I'm including the detailed description so others can copy our approach - I think they should): 1. We have set up a good-looking website, found a board, set up a legal entity. 2. Asked and obtained endorsement from academics already familiar with existential risk. 3. Found a freelance, well-known ex-journalist and ex-parliamentarian to work with us as a media strategist. 4. Wrote op-eds warning about AGI existential risk, as explicitly as possible, but heeding the media strategist's advice. Sometimes we used academic co-authors. Four out of six of our op-eds were published in leading newspapers in print. 5. Organized drinks, networked with journalists, introduced them to others who are into AGI existential risk (e.g. EAs). Our most recent result (last weekend) is that a prominent columnist who is agenda-setting on tech and privacy issues in NRC Handelsblad, the Dutch equivalent of the New York Times, wrote a piece where he talked about AGI existential risk as an actual thing. We've also had a meeting with the chairwoman of the Dutch parliamentary committee on digitization (the line between a published article and a policy meeting is direct), and a debate about AGI xrisk in the leading debate centre now seems fairly likely. We're not there yet, but we've only done this for less than a year, we're tiny, we don't have anyone with a significant profile, and we were self-funded (we recently got our first funding from SFF - thanks guys!). I don't see any reason why our approach wouldn't translate to other countries, including the US. If you do this for a few years, consistently, and in a coordinated and funded way, I would be very surprised if you cannot
    I think you have to specify which policy you mean. First, let's for now focus on regulation that's really aiming to stop AGI, at least until safety is proven (if possible), not on regulation that's only focusing on slowing down (incremental progress). I see roughly three options: software/research, hardware, and data. All of these options would likely need to be global to be effective (that's complicating things, but perhaps a few powerful states can enforce regulation on others - not necessarily unrealistic). Most people who talk about AGI regulation seem to mean software or research regulation. An example is the national review board proposed by Musk. A large downside of this method is that, if it turns out that scaling up current approaches is mostly all that's needed, Yudkowsky's argument that a few years later, anyone can build AGI in their basement (unregulatable) because of hardware progress seems like a real risk. A second option not suffering from this issue is hardware regulation. The thought experiment of Yudkuwsky that an AGI might destroy all CPUs in order to block competitors, is perhaps its most extreme form. One nod less extreme, chip capability could be forcibly held at either today's capability level, or even at a level of some safe point in the past. This could be regulated at the fabs, which are few and not easy to hide. Regulating compute has also been proposed by Jaan Tallinn in a Politico newsletter, where he proposes regulating flops/km2. Finally, an option could be to regulate data access. I can't recall a concrete proposal but it should be possible in principle. I think a paper should urgently be written about which options we have, and especially what the least economically damaging, but still reliable and enforcible regulation method is. I think we should move beyond the position that no regulation could do this - there are clearly options with >0% chance (depending strongly on coordination and communication) and we can't afford to w

    And to this I reply: Obviously, the measuring units of dignity are over humanity’s log odds of survival—the graph on which the logistic success curve is a straight line. A project that doubles humanity’s chance of survival from 0% to 0% is helping humanity die with one additional information-theoretic bit of dignity.

    Joking aside, this sort of objective function is interesting, and incoherent due to being non-VNM. E.g. if there's a lottery between 0.1% chance of survival and 1% chance of survival, then how this lottery compares to a flat 0.5% chance of survival depends on the order in which the lottery is resolved. A priori, (50% of 0.1%, 50% of 1%) is equivalent to 0.55%, which is greater than 0.5%. On the other hand, the average log-odds (after selecting an element of this lottery) is 0.5 * log(0.1%) + 0.5 * log(1%) < log(0.5%).

    This could lead to "negative VOI" situations where we avoid learning facts relevant to survival probability, because they would increase the variance of our odds, and that reduces expected log-odds since log is convex.

    It's also unclear whether to treat different forms of uncertainty differently, e.g. is logical uncertainty treated differently from indexical/quantum uncertainty?

    This could make sense as a way of evaluating policies chosen at exactly the present time, which would be equivalent to simply maximizing P(success). However, one has to be very careful with exactly how to evaluate odds to avoid VNM incoherence.

    First-order play for log-probability over short-term time horizons, as a good idea in real life when probabilities are low, arises the same way as betting fractions of your bankroll arises as a good idea in real life, by:

    •  expecting to have other future opportunities that look like a chance to play for log-odds gains,
    • not expecting to have future opportunities that look like a chance to play for lump-sum-of-probability gains,
    • and ultimately the horizon extending out to diminishing returns if you get that far.

    That is, the pseudo-myopic version of your strategy is to bet fractions of your bankroll to win fractions of your bankroll.  You don't take a bet with 51% probability of doubling your bankroll and 49% probability of bankruptcy, if you expect more opportunities to bet later, there aren't later opportunities that just give you lump-sum gains, and there's a point beyond which money starts to saturate for you.

    Hmm. It seems like if you really expected to be able to gain log-odds in expectation in repeated bets, you'd immediately update towards a high probability, due to conservation of expected evidence. But maybe a more causal/materialist model wouldn't do this because it's a fairly abstract consideration that doesn't have obvious material support.

    I see why "improve log-odds" is a nice heuristic for iteratively optimizing a policy towards greater chance of success, similar to the WalkSAT algorithm, which solves a constraint problem by changing variables around to reduce the number of violated constraints (even though the actual desideratum is to have no violated constraints); this is a way of "relaxing" the problem in a way that makes iterative hill-climbing-like approaches work more effectively.

    Relatedly, some RL approaches give rewards for hitting non-victory targets in a game (e.g. number of levels cleared or key items gained), even if the eventual goal is to achieve a policy that beats the entire game.

    I think possibly the key conceptual distinction you want to make is between short-term play and long-term play.  If I deliberately assume an emotional stance, often a lot of the benefit to be gained therefrom is how it translates long-term correct play into myopic play for the emotional reward, assuming of course that the translation is correct.  Long-term, you play for absolute probabilities.  Short-term, you chase after "dignity", aka stackable log-odds improvements, at least until you're out of the curve's basement.

    I feel like this comment in particular is very clarifying with regards to the motivation of this stance. The benefit is that this imports recommendations of the ideal long-run policy into the short-run frame from which you're actually acting.

    I think that should maybe be in the post somewhere.

    I had a similar thought.  Also, in an expected value context it makes sense to pursue actions that succeed when your model is wrong and you are actually closer to the middle of the success curve, because if that's the case you can increase our chances of survival more easily. In the logarithmic context doing so doesn't make much sense, since your impact on the logistic odds is the same no matter where on the success curve you are.  Maybe this objective function (and the whole ethos of Death with Dignity) is way to justify working on alignment even if you think our chances of success are close to zero. Personally, I'm not compelled by it.
    Measuring units and utilons are different, right? I measure my wealth in dollars but that doesn't mean my utility function is linear in dollars.

    Do you think the decision heuristic Eliezer is (ambiguously jokingly) suggesting gives different policy recommendations from the more naive "maxipok" or not? If so, where might they differ? If not, what's your guess as to why Eliezer worded the objective differently from Bostrom? Why involve log-probabilities at all?

    I read this as being "maxipok", with a few key extensions:

    • The 'default' probability of success is very low
    • There are lots of plans that look like they give some small-but-relatively-attractive probability of success, which are basically all fake / picked by motivated reasoning of "there has to be a plan." ("If we cause WWIII, then there will be a 2% chance of aligning AI, right?")
    • While there aren't accessible plans that cause success all on their own, there probably are lots of accessible sub-plans which make it more likely that a surprising real plan could succeed. ("Electing a rationalist president won't solve the problem on its own, but it does mean 'letters from Einstein' are more likely to work.")

    Minor meta note: others are free to disagree, but I think it would be useful if this comment section were a bit less trigger-happy about downvoting comments into the negatives.

    I'm normally pretty gung-ho about downvotes, but in this case I think there's more-than-usual value in people sharing their candid thinking, and too much downvoting can make people feel pressured to shape their words and thoughts in ways that others would approve of.

    Agreed. I am purposefully upvoting (and in some cases strong upvoting) a number of comments I disagree with, because I want to encourage people to speak their minds.
    5Lone Pine
    OP definitely pissed me off enough to make me want to be more candid.

    I'm more optimistic than Yudkowsky[1], and I want to state what I think are the reasons for the different conclusions (I'm going to compare my own reasoning to my understanding of Yudkowsky's reasoning, and the latter might be flawed), in a nutshell.

    • Yudkowsky seems very pessimistic about alignment of anything resembling deep learning, and also believes that deep learning leads to TAI pretty soon. I'm both more optimistic about aligning deep learning and more skeptical of TAI soon.
    • Optimism about deep learning: There has been considerable progress in theoretical understanding of deep learning. This understanding is far from complete, but also the problem doesn't seem intractable. I think that we will have pretty good theory in a decade, more likely than not[2].
    • Skepticism of TAI soon: My own models of AGI include qualitative elements that current systems don't have. It is possible that the gap will be resolved soon, but also possible that a new "AI winter" will eventually result.
    • Yudkowsky seems to believe we are pretty far from a good theory of rational agents. On the other hand, I have a model of how this theory will look like, and a concrete pathway towards constructing it.
    • The
    ... (read more)

    MIRI have been very gung-ho about using logic and causal networks. At the same time they mostly ignored learning theory.

    I'll remark in passing that I disagree with this characterization of events.  We looked under some street lights where the light was better, because we didn't think that others blundering around in the dark were really being that helpful - including because of the social phenomenon where they blundered around until a bad solution Goodharted past their blurry filters; we wanted to train people up in domains where wrong answers could be recognized as that by the sort of sharp formal criteria that inexperienced thinkers can still accept as criticism.

    That was explicitly the idea at the time.

    Thanks for responding, Eliezer.

    I'm not sure to what extent you mean that (i) your research programme was literally a training exercise for harder challenges ahead vs (ii) your research programme was born of despair: looking under a street light had a better chance of success even though the keys were not especially likely to be there.

    If you mean (i), then what made you give up on this plan? From my perspective, the training exercise played its role and perhaps outlived its usefulness, why not move on beyond it?

    If you mean (ii), then why such pessimism from the get-go? I imagine you reasoning along the lines of: developing the theory of rational agency is a difficult problem with little empirical feedback in early stages, hence it requires nigh impossible precision of reasoning. But, humanity actually has a not-bad track record in this type of questions in the last century. VNM, game theory, the Church-Turing thesis, information theory, complexity theory, Solomonoff induction: all these are examples of similar problems (creating a mathematical theory starting from an imprecise concept without much empirical data to help) in which we made enormous progress. They also look like they a... (read more)


    While we happen to be on the topic: can I ask whether (a) you've been keeping up with Vanessa's work on infra-Bayesianism, and if so, whether (b) you understand it well enough to have any thoughts on it? It sounds (and has sounded for quite a while) like Vanessa is proposing this as an alternative theoretical foundation for agency / updating, and also appears to view this as significantly more promising than the stuff MIRI has been doing (as is apparent from e.g. remarks like this):

    Optimism about deep learning: There has been considerable progress in theoretical understanding of deep learning. This understanding is far from complete, but also the problem doesn't seem intractable. I think that we will have pretty good theory in a decade, more likely than not[...]

    Yudkowsky seems to believe we are pretty far from a good theory of rational agents. On the other hand, I have a model of how this theory will look like, and a concrete pathway towards constructing it.

    Ideally I (along with anyone else interested in this field) would be well-placed to evaluate Vanessa's claims directly; in practice it seems that very few people are able to do so, and consequently infra-Bayesianism has received... (read more)


    Sent this to my dad, who is an old man as far outside the rationalist bubble as you could possibly be. Doesn't even know why we're worried about AGI, but he replied:

    No one gets out alive. No one. You should pray.

    Somehow it helped me cope.