In MIRI announces new "Death With Dignity" strategy, Eliezer makes a number of claims. As I understood them, they are:
- Humanity is very likely doomed, because AI alignment is extremely hard, and no one has a strategy that could plausibly work (or is on track to find one)
- The right thing to do is maximize the log odds of humanity’s survival
- The way to do this to adopt a mindset Eliezer calls “dying with dignity”
Assuming 1) and 2) are true, I’m not convinced the right mindset to adopt is “dying with dignity”. I don’t like it as a framing of the problem. Personally, it sounds demotivating and depressing, and not the thing that will help me maximize the log odds of success. I recognize that not everyone has the same mental setup as me, and some people might find the dying with dignity framing helpful. If it works for you, great, but I think there is a better problem framing that will work for a wider variety of mental setups, and I call it playing to your outs.
This problem framing arose out of a discussion about the dying with dignity post with Eli Rose and Seraphina Nix. Eli pointed out that in Magic the Gathering, there is a useful strategy to adopt when you’re likely to lose, called playing to your outs. Basically, there are times when you’re losing and it’s tempting to try to minimize your losses and make what gains you can. But if you’re sufficiently behind, this will almost always fail. Instead you need to think ahead to what cards you can draw, or what cards your opponent might fail to draw, that will afford you a chance to win. Then plan for one of those scenarios happening, even if it’s quite unlikely.
If 1) is true, then we are losing badly in our game of AI existential risk. It’s hard to stay motivated when things are dire, and I claim our stance should be like that of a good Magic player. Playing carefully, choosing strategies that have a shot at winning, keeping our eye on the prize. Being ready to win if the opportunity presents itself. I don’t think this is a different strategy than the one Eliezer is proposing; I think the strategy is identical. But I think there’s an important mindset difference, even if it’s just in the name, and that mindset difference matters.
I am not saying to condition upon some particular unlikely clever scheme working. The actual world is not a Magic game; it’s much more complex. So you can’t just wait for a particular card to come up in your deck. Playing to your outs might mean spending most of your time looking for outs, e.g. searching for alignment strategies that might work, trying to find extremely talented individuals who can do good research, or searching for AI governance strategies that might give us more time and reduce arms race dynamics. You have to identify a real potential out before you can play to it.
I think Eliezer proposes the dying with dignity framing in order to 1) stay motivated while 2) mitigating common epistemological errors when facing extremely bad odds. He writes:
When Earth's prospects are that far underwater in the basement of the logistic success curve, it may be hard to feel motivated about continuing to fight, since doubling our chances of survival will only take them from 0% to 0%.
That's why I would suggest reframing the problem - especially on an emotional level - to helping humanity die with dignity, or rather, since even this goal is realistically unattainable at this point, die with slightly more dignity than would otherwise be counterfactually obtained.
Obviously this is somewhat tongue in cheek, and possibly overstated for April 1st, but if it’s overstated I think it’s only slightly overstated given what else Eliezer has said about probability of doom. However, I think he’s serious in saying that motivation is important, and having a realistic goal can help aid motivation. The problem with dying with dignity as a motivational tool is that it centers dying, and the dignity doesn’t really hold any water next to the dying part, at least for me.
Even though Eliezer defines dying with dignity as optimizing log odds of success, it’s hard to get away from the normal connotations of the term. When I google “dying with dignity”, the first results are about terminally ill patients voluntarily ending their own lives. In the normal usage, dying with dignity is about accepting your death and not fighting the inevitable. I do not find this inspiring.
I find playing to your outs a lot more motivating. The framing doesn’t shy away from the fact that winning is unlikely. But the action is “playing” rather than “dying”. And the goal is “outs” rather than “dignity”. Again, I think the difference is in connotation and not actually strategy. To actually find outs, you have to search for solutions that might work, and stay focused on taking actions that improve our odds of success. When I imagine a Magic player playing to their outs, I imagine someone careful and engaged, not resigned. When I imagine someone dying with dignity, a terminally ill patient comes to mind. Peaceful, not panicking, but not fighting to survive.
I don’t think the following is a useful way to think for most people:
But if enough people can contribute enough bits of dignity like that, wouldn't that mean we didn't die at all? Yes, but again, don't get your hopes up. Don't focus your emotions on a goal you're probably not going to obtain. Realistically, we find a handful of projects that contribute a few more bits of counterfactual dignity; get a bunch more not-specifically-expected bad news that makes the first-order object-level situation look even worse (where to second order, of course, the good Bayesians already knew that was how it would go); and then we all die.
Maybe this part is a joke, but it seems consistent with other things Eliezer has written recently. I’m not down with the death of hope. I think hope is a useful motivator, and I think it’s possible to maintain hope while still looking the hard grim probabilities in the face. I enjoy thinking about the glorious transhumanist future, and focusing my emotions on that goal helps me get up each morning to fight the good fight. I’ve detached my grimometer because it wasn’t helping me. Hope and long odds aren’t mutually exclusive.
I suspect the main reason Eliezer advocates dying with dignity is that he’s concerned if most people adopt a different stance they won’t be able to face the hard truths. He wants people to avoid the epistemic failure that comes from having to believe there is a reasonable way out. And I think that’s good!
Death with dignity means going on mentally living in the world you think is reality, even if it's a sad reality, until the end; not abandoning your arts of seeking truth; dying with your commitment to reason intact.
I’m very pro this sentiment, but I think you can get it without death with dignity. A Magic player facing long odds knows they don’t win in most possible game states. They don’t delude themselves about that. But they keep playing. They keep watching for potential outs. And then when the game is over and they lose, then and only then do they stop trying to win. That’s how I want to orient.
Q1: But wasn’t the Dying with Dignity post just a joke, since it was posted on April Fool’s Day?
A: I’ve heard Eliezer say basically all those things on days other than April 1st. He talks about dying with dignity a lot in the late 2021 MIRI conversations, see this doc for the specific mentions. Maybe “MIRI announces” was a joke, but most of the content was serious, I’m pretty sure.
Q2: But isn’t “playing to your outs” encouraging people to condition upon “comfortable improbabilities”?
A: I’m encouraging people not to do that, but that is a potential risk of this frame. No matter how you frame it, playing to win with really long odds is extremely hard. You need to find real outs in order to play to them. So if you don’t have any outs, the way to play to your outs is to start looking for outs, not going with the “closest best thing” and trying to make that work. I think there’s an important distinction between conditioning on “comfortable improbabilities” and “things that have to work in order to avoid extinction”. In my view, it seems likely that any viable success plan will involve several things that all need to go right. It seems legit to me to pick one of them on the critical path and start working on it, even if you think it’s quite unlikely the other elements of the plan will come together in time.
Q3: Okay, but isn’t “playing to your outs” encouraging people to do crazy or violent things?
A: No, most crazy or violent plans are really dumb and unlikely to work, for the reasons Eliezer outlines in his post. Playing to your outs doesn’t give you license to execute bad plans.
I think this is the largest downside to this framing (though I'd include simply "counter-productive" along with the others), and comes out of the MtG-vs-reality disparity: in MtG you know your outs; in reality, you do not. (I know you mentioned this, but it needs more shake-you-by-the-lapels-and-scream-it-in-your-face)
This framing needs to come with prominent warnings along the lines of:
Do NOT assume that what you think is an out is certainly an out.
Do NOT assume that the potential outs you're aware of are a significant proportion of all outs.
The issue with crazy/violent plans is not only that they're usually dumb/unlikely-to-work, but that even when they would be sensible given particular implicit assumptions, those assumptions may not hold.
People don't know what they don't know. Playing to the-outs-I-can-think-of will feel like "playing to my outs" - while I may be witlessly screwing up a load of outs I didn't notice.
I guess this is why Eliezer went with the dignity framing - it suggests a default of caution, and perhaps deliberation. Playing to your outs suggest throwing caution to the wind (this is often how it works when you know your outs).
For anyone adopting this framing, I suggest an exercise where you first imagine having someone repeatedly scream YOU DO NOT KNOW ALL YOUR OUTS!! YOU WILL NEVER KNOW ALL YOUR OUTS!! at you for long enough that you know it in your bones.
In poker or MtG you can often actually figure out your outs.
Even in games with just a bit more complexity (yes, in certain ways MtG is very simple) it can be very very hard to know what your outs really are. And if one of your outs is "actually the next few turns will not be as devastating as I think they will", then almost all versions of playing to your outs will be worse than playing for generic-looking EV.
Same with life. Don't play to your outs until and unless you're extremely sure you've modeled the world well enough to know what ALL of your outs ARE, and aren't. And that boils down to, basically, don't play to your outs. OP hinted at this with "find out what your outs are" but when the advice boils down to "don't play to your outs, gain knowledge about the game state" I would say the framing should be discarded.
I agree finding your outs is very hard, but I don't think this is actually a different challenge than increasing "dignity". If you don't have a map to victory, then you probably lose. I expect that in most worlds where we win, some people figured out some outs and played to them.
I currently don't know of any outs. But I think I know some things that outs might require and am working on those, while hoping someone comes up with some good outs - and occasionally taking a stab at them myself.
I think the main problem is the first point and not the second point:
The current problem, if Eleizer is right, is basically that we have 0 outs. Not that the ones we have might be less promising than other ones. And he's criticising people for not thinking their plans are outs when they're actually not.
Well, I think that's a real problem, but I worry Eliezer's frame will generally discourage people from even trying to come up with good plans at all. That's why I emphasize outs.
Oh sure - I don't mean to imply there's no upside in this framing, or that I don't see a downside in Eliezer's.
However, whether you know of outs depends on what you see as an out. E.g. buying much more time to come up with a solution could be seen as an out by some people. It's easy to imagine many bad plans to do that, with potentially hugely negative side-effects.
Some of those bad plans would look rational, conditional on an assumption that there was no other way to avoid losing the future. Of course making such an assumption is poor reasoning, but the trouble is that it happens implicitly: nobody needs to say to themselves "...and here I assume that no-one on earth has or will come up with approaches I've missed", they only need to fail to ask themselves the right questions.
Conditional on being very clear on not knowing the outs, I think this framing may well be a good one for many people - but I'm serious about the mental exercise.
I agree with the first point of "dying with dignity" doesn't seem like it accomplishes the goal it was intending to accomplish. I'm not yet sold on "play to your outs" accomplishing the right thing (my cruxes have to do with how people end up interpreting / acting-on / being-motivated-by the phrase).
I interpreted the die-with-dignity post to be saying something like:
"Die with dignity" seems overdeterminedly like a bad slogan because of landfish's aforementioned google of "dying with dignity", which is about terminally people voluntarily killing themselves. Seems like totally the wrong vibe, and not worth trying to win a memetic war on.
I think Eliezer meant something like "die with dignity the way a soldier would" (where you keep fighting to end). I think "die fighting with dignity" is at least a marginal improvement (and even fits within the 5 word limit for slogans!). But, still doesn't feel quite right.
But... "Play to your outs"... doesn't seem at first glance to me like it really solves the problems in the middle? (#2, #3, #4, #5?). I feel like it encourages #5 (and Jeff's vague Q&A on "don't do that" doesn't seem reassuring to me). It also seems to encourage #3 (and again the vague admonishment to "not do that" doesn't seem that reassuring to me.)
I don't think "play to your outs" scales. When I imagine 100 people trying to do that who aren't part of a single company with shared leadership, I imagine them doing a bunch of random stuff that is often at cross-purposes.
Right now I feel like "play to your outs" gives me the illusion of having a strategy-and-life-philosophy, but doesn't feel like it directs my attention in particularly useful ways.
I don't have a great alternate slogan at the time, but I'm not sure the thing exactly bottlenecked personally on a slogan exactly.
(I'm not strongly confident it's not a good phrase here – If it turned out everyone ended up naturally doing sensible things when following it as a strategy/life-philosophy, coolio I guess. I see some people have responded positively to the post so far. I'm worried those people are responding more to the fact that someone gave them something less uncomfortable/depressing than 'die with dignity' to orient around, rather than actually getting directed in a useful way)
"It also seems to encourage #3 (and again the vague admonishment to "not do that" doesn't seem that reassuring to me.)"
I just pointed to Eleizer's warning which I thought was sufficient. I could write more about why I think it's not a good idea, but I currently think a bigger portion of the problem is people not trying to come up with good plans rather than people coming up with dangerous plans which is why my emphasis is where it is.
Eliezer is great at red teaming people's plans. This is great for finding ways plans don't work, and I think it's very important he keep doing this. It's not great for motivating people to come up with good plans, though. And I think that shortage of motivation is a real threat to our chances to mitigate AI existential risk. I was talking to a leading alignment researcher yesterday who said their motivation had taken a hit from Eliezer's constant "all your plans will fail" talk, so I'm pretty sure this is a real thing, even though I'm unsure of the magnitude.
I largely agree with that, but I think there's an important asymmetry here: it's much easier to come up with a plan that will 'successfully' do huge damage, than to come up with a plan that will successfully solve the problem.
So to have positive expected impact you need a high ratio of [people persuaded to come up with good plans] to [people persuaded that crazy dangerous plans are necessary].
I'd expect your post to push a large majority of readers in a positive direction (I think it does for me - particularly combined with Eliezer's take).
My worry isn't that many go the other way, but that it doesn't take many.
I think that's a legit concern. One mitigating factor is that people who seem inclined to rash destructive plans tend to be pretty bad at execution, e.g. Aum Shinrikyo
"Die with honor" might be a good phrasing?
Just a note on confidence, which seems especially important since I'm making a kind of normative claim:
I'm very confident "dying with dignity" is a counterproductive frame for me. I'm somewhat confident that "playing to your outs" is a really useful frame for me and people like me. I'm not very confident "playing to your outs" is a good replacement to "dying with dignity" in general, because I don't know how much people will respond to it like I do. Seeing people's comments here is helpful.
So, in my mind, the thing that "dying with dignity" is supposed to do is that when you look at plan A and B, you ask yourself: "which of these is more dignified?" instead of "which of these is less likely to lead to death?", because your ability to detect dignity is more sensitive than your ability to detect likelihood of leading to death on the present margin. [This is, I think, the crux; if you don't buy this then I agree the framing doesn't seem sensible.]
This lets you still do effective actions (that, in conjunction with lots of other things, can still lead to less likelihood of death), even if when you look at any plan in isolation, the result is "yep, still 0% chance of survival", because maybe some plans lead to 3 units of dignity and other plans lead to 4 units of dignity, and you'd rather than 4 than 3 units of dignity.
[Going back to the crux--I think if we actually had outs, and knew how to play to them, this plan would make lots of sense. "Ok guys, we should do X because when we draw Fireball, we'll then be able to win." But the plans I'm presently most optimistic about look more like "when you don't know where the bugs are, tidy up your codebase", which seems like a pretty different approach, while lining up with 'dignity'.]
Of course, you also need a sense of dignity that's more like "we did things that were sane and cared about whether or not we made it" instead of "we didn't make a fuss or look too strange" or something like that.
Would you say that this constitutes using dignity as a proxy for indirect increase in survival odds and/or increase in broad preparation to execute on outs such that “dignity” is expected to have easier-to-grasp scaling properties and better emotional binding?
I like this post. I think that if Eliezer had written "MIRI announces new 'Play To Your Outs' strategy", it wouldn't have helped with some specific goals he has:
But given that someone's already read the "Death With Dignity" thing and internalized it as a reason not to do that stuff, I do expect that a lot of people will do better work if they're thinking in terms of "play to your outs" on a daily basis. (Not everyone, but a lot of people.)
Like, for me personally, I think it was important that I go through something like a period of mourning and acceptance. (The big one I remember was in October 2017.) But I also don't think that a "mourning and acceptance" vibe is the right air to be breathing hour-to-hour as I get work tasks done. (Again, speaking for me personally.)
That's a great distinction, thank you for making it. Also a great example of a Third Alternative.
It seems to me that it would be better to view the question as "is this frame the best one for person X?" rather than "is this frame the best one?"
Though, I haven't fully read either of your posts, so excuse any mistakes/confusion.
Congrats on making an important and correct point without needing to fully read the posts! :) That's just efficiency.
Hm. I think I don't want to socially reward people for making drive-by points without reading at least the specific blog post that they're commenting on. And the quality of their point, isn't very relevant to me in my not wanting to reward that.
Like, I think it's bad to normalize people dropping into high context conversations to say something based their superficial understanding. Most of the time when people do that, it's annoying. And generally, it's not a good distribution of interpretive labor.
I don't feel like pushing back against Jack here, since this seems minor, but I do feel like pushing back against Rob specifically encouraging this.
If you can write one of the most useful comments in a 487-comment discussion, without needing to read the full posts (or discussions) in question, then you get credit in my book not only for the comment quality but also for efficiently processing evidence. I see it as following the same principle as:
(Obviously, you might not agree with me that this is one of the most useful comments.)
In general, it's an (unusually important and central) virtue to require less evidence in order to reach the correct conclusion. Cf. Scott's "[...] obviously it’s useful to have as much evidence as possible, in the same way it’s useful to have as much money as possible. But equally obviously it’s useful to be able to use a limited amount of evidence wisely, in the same way it’s useful to be able to use a limited amount of money wisely."
Like, sure, skimming a post can result in people writing bad comments too. But everyone knows that already, and if someone is skimming excessively, it should be possible to wait and criticize that once they actually write some bad comments, rather than criticizing on an occasion where the heuristic worked.
This particular situation also involves things that might make that move easy here - Eliezer posted X, and this is a response. The nature of the two, allows for a comment on their nature and the situation. It's not that the second post is required for the comment, but that as the second post suited the situation, so did the comment.
The approach can work broadly, but in this situation it's reasons for working seem to be about that. (in the same way the comment is about that)
but I noted this anyway because skimming and discernment are both options here. This situation seems clear - and there were different comments on the first post that this reminds me of.
I strongly prefer the "dying with dignity" mentality for 3 basic reasons:
Of these, the 3rd feels the most important to me, partly because I've seen it discussed least. It seems like if Eliezer's basic model is right, a significant portion of the good outcomes require some kind of miracle occuring at crunch time, which will presumably be easier to obtain if key players are emotionally prepared and not suddenly freaking out for the first time (on an emotional/subconscious level). I know basically nothing about psychology, but isn't it a bad sign if you retreat to "oh death with dignity is unmotivating, let's just focus on our outs" when AGI is less salient?
Just a dumb question, why do you/Eliezer talk about the log-odds and not just probability? Is there any reason behind this word choice?
Besides this, I'm fully on board, I enjoy this framing much more than the "die with dignity"
I think it's harder to think about 0.00000000000000000000000000000001 than about 32 units of fucked.
I agree that log-odds are better units, I don't agree with replacing probability, a perfectly well understandable term for everyone, with a word that seems to be unnecessarily complicated and that only diminishes the clarity of the text. I am in general overly suspicious of groups that try to hide simple ideas behind a curtain of difficult terms.
Agree with this. I think it can be hard to know what's jargon and what's not from inside a technical field, and there's signalling value in talking like an insider, and counter-signalling value in being proud of your plain clear speech and avoiding all that jargon, and it's all very difficult.... I personally like to think that I speak in the clearest terms possible, but I can never resist the temptation to load everything with double meanings and irony.
But in this case log-odds vs probability may be the point of 'dying with dignity'; it's perhaps easier to care about adding or subtracting a zero in the probability of non-doom that it is to care about moving it by a microsmidgen.
Sometimes (especially when thinking about evidence in baysean updates) Log odds can be useful because one bit of evidence just adds 1 bit to your log odds (in Base 2). Eliezer goes into more detail here: https://www.lesswrong.com/posts/QGkYCwyC7wTDyt3yT/0-and-1-are-not-probabilities
"log odds" : "probability"
"epistemic status" : "confidence level"
(there are useful reasons to talk about log odds instead of probabilities, as in the post @Morpheus links to, but it also does seem like there's some gratuitous use of jargon going on)
Even given the clarifications in this post, "playing to your outs" seems like it comes with some subtle denial. My sense is that it's resisting a pending emotional update?
I'm curious if this resonates.
Maybe we should keep the "Dying" part but ditch "with Dignity" (because "Dying with Dignity" sounds like giving up and peacefully resigning yourself).
Dying with Iron Integrity, Sane Strategizing, Courageous Calibration, and Obstinate Optimization
or DISCO (Dying with Integrity, Sanity, Courage, and Optimization) for short.
At least to my mind, other phrases of the form "Dying with X" also make it sound like you've just plain given up on "not dying" except that you're still going through the motions for some reason. (Of course I'm not arguing that that's what you're actually doing, only that that's what the optics are)
(Sorry if feedback from rando is unwanted)
I don't think it's quite the same thing, but another relevant framing is to increase your luck surface area.
I do like the framing of "playing to your outs" for various reasons. I'm not sure how I feel about which framing is better, but here are two thoughts.
Recently Eliezer has used the dying with dignity frame a lot outside his April 1st day post. So while some parts of that post may have been a joke, the dying with dignity part was not. For example: https://docs.google.com/document/d/11AY2jUu7X2wJj8cqdA_Ri78y2MU5LS0dT5QrhO2jhzQ/edit?usp=drivesdk
I think you're right that dying with dignity is a better frame specifically for recommending against doing unethical stuff. I agree with everything he said about not doing unethical stuff, and tried to point to that (maybe if I have time I will add some more emphasis here).
But that being said, I feel a little frustrated that people think that caveats about not doing unethical stuff are expected in a post like this. It feels similar to if I was writing a post about standing up for yourself and had to add "stand up to bullies - but remember not to murder anyone". Yes you should not murder bullies. But I wish to live in a world where we don't have to caveat with that every time. I recognize that we might not live in such a world. Maybe if someone proposes "play to your outs", people jump to violent plans without realizing how likely that is to be counterproductive to the goal. And this does seem to be somewhat true, though I'm not sure the extent of it. And I find this frustrating. That which is already true... of course, but I wish people would be a little better here.
That makes sense. And thank you for emphasizing this.
I think both of our points stand. My point is about the title of this specific April Fools Day post. If it's gonna be an April Fools Day post, "playing to your outs" isn't very April Fools-y.
And your point stands I think as well, if I'm interpreting you correctly, that he's chosen the messaging of "death with dignity" outside of the context of April Fools Day as well, in which case "it's an April Fools Day post" isn't part of the explanation.
I hear ya for sure. I'm not sure what to think about how necessary it is either. The heuristic of "be more cynical about humans" comes to mind though, and I lean moderately strongly towards thinking it is a good idea.
Not clear to me. Why not?
I think the April fool's day element is extremely neglected in this discussion. E was trying to be provocative, not putting forth an ironclad proposal.
Recently Eliezer has used the dying with dignity frame a lot outside his April 1st day post. So while some parts of that post may have been a joke, the dying with dignity part was not. For example: https://docs.google.com/document/d/11AY2jUu7X2wJj8cqdA_Ri78y2MU5LS0dT5QrhO2jhzQ/edit?usp=drivesdk
If you have specific examples where you think I took something too seriously that was meant to be a joke, I'd be curious to see those.
It seems this isn't true, excepting only the title and the concluding question. FWIW this wasn't at all obvious to me either.
Thanks for following up and clarifying!
I find it possible but pretty unlikely that he was trying to be provocative. Maybe he pushed things a little bit with the provocativeness, but I'd be quite surprised if it turned out to be more than "just a little bit".
I'm thinking about what he wrote in the post about consequentialism. Being provocative is unvirtuous, in the sense that it is lying to/misleading people. Maybe that is ok because the ends justify the means? Possibly, but Eliezer warns quite strongly against that sort of reasoning. He also talks about how the reputation of the alignment community is pretty important. Being provocative hurts this reputation. And his own personal reputation, which is similarly important. Plus he just seems to have a very strong fondness for the truth, and not stretching it, probably moreso than anyone else I can think of.
I judge that he would be willing to go against these principles in theory, but it would have to be a pretty extreme and clear cut case, and we don't seem to be in that ballpark here.
A problem with this framing that's related to, but I think not the same as, the one pointed out by Joe_Collman:
"Play to your outs" suggests focusing on this one goal even if what you have to do to have a hope of meeting that goal have other very bad consequences. This makes sense in many games, where the worst win is better than the best not-win. It may not make sense in actual life.
If you reckon some drastic or expensive action makes a 2% improvement to our chances of not getting wiped out by a super-powerful unaligned AI, then that improvement might be worth a lot of pain. If you reckon the same action makes a 0.02% improvement to our chances, the tradeoff against other considerations may look quite different.
But if you frame things so that your only goal is "win if possible" at the game of not getting wiped out by a super-powerful unaligned AI, and if either of those is the best "out" available, then "play to your outs" says you should do it.
Joe points out that you might be wrong about it being the best "out". I am pointing out that if it is your best "out" then whether it's a good idea may depend on how good an "out" it is, and that the "play to your outs" framing discourages thinking about that and considering the possibility that it might not be good enough to be worth playing to.
[EDITED to add:] Q3 at the end of the post is relevant, but I don't think the answer given invalidates what I'm saying. It may very well be true that most crazy/violent plans are bad, but that's the wrong question; the right question is whether the highest-probability plans (in the scenario where all those probabilities are very low) are crazy/violent.
Playing to your outs seems potentially VNM-consistent in ways the log-odds approach doesn't https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy?commentId=cWaoFBM79atwnCadX
"playing to your outs" might be more commonly known as running a Hail Mary.
I like this framing so, so much more. Thank you for putting some feelings I vaguely sensed, but didn't quite grasp yet, into concrete terms.
This framing feels like a much more motivating strategy, even though it's pretty much identical to what Eliezer is proposing.
The distinction between your post and Eliezer's is more or less that he doesn't trust anyone to identify or think sanely about [plans that they admit have negative expected value in terms of log odds but believe possess a compensatory advantage in probability of success conditional on some assumption].
Such plans are very likely to hurt the remaining opportunities in the worlds where the assumption doesn't hold, which makes it especially bad if different actors are committing to different plans. And he thinks that even if a plan's assumptions hold, the odds of its success are far lower than the planner envisioned.
Eliezer's preferred strategy at this point is to continue doing the kind of AI Safety work that doesn't blow up if assumptions aren't met, and if enough of that work is complete and there's an unexpected affordance for applying that kind of work to realistic AIs, then there's a theoretical possibility of capitalizing on it. (But, well, you see how pessimistic he's become if he thinks that's both the best shot we have and also probability ~0.)
And he wanted to put a roadblock in front of this specific well-intentioned framing, not least because it is way too easy for some readers to round into support for Leeroy Jenkins strategies.
It also might read as being about stuff like cryopreservation though.
Some possible outs and opportunities: Some semi-strong AI is deployed in a dangerous and very damaging Headline way a few before AGI is developed, allowing for a tiny sliver of a chance to rein the research sector with whatever the best ideas are for doing so.
A new Luddite movement (it could be us?) slows down research for an extra 10 years through sheer shouting, agitation, and political dealing, allowing for possibilities and ideas that might be helpful.
Well-defined safety guidelines which are enforced on research and actually stops the creation of anything not provably safe (which we somehow figure out).
I think another framing is anthropic-principle optimization; aim for the best human experiences in the universes that humans are left in. This could be strict EA conditioned on the event that unfriendly AGI doesn't happen or perhaps something even weirder dependent on the anthropic principle. Regardless, dying only happens in some branches of the multiverse so those deaths can be dignified which will presumably increase the odds of non-dying also being dignified because the outcomes spring from the same goals and strategies.
Not sure if it counts as an "out" (given I think it's actually quite promising), but definitely something that should be tried before the end:
Megastar salaries for AI alignment work
[Summary from the FTX Project Ideas competition]
Aligning future superhuman AI systems is arguably the most difficult problem currently facing humanity; and the most important. In order to solve it, we need all the help we can get from the very best and brightest. To the extent that we can identify the absolute most intelligent, most capable, and most qualified people on the planet – think Fields Medalists, Nobel Prize winners, foremost champions of intellectual competition, the most sought-after engineers – we aim to offer them salaries competitive with top sportspeople, actors and music artists to work on the problem. This is complementary to our AI alignment prizes, in that getting paid is not dependent on results. The pay is for devoting a significant amount of full time work (say a year), and maximum brainpower, to the problem; with the hope that highly promising directions in the pursuit of a full solution will be forthcoming. We will aim to provide access to top AI alignment researchers for guidance, affiliation with top-tier universities, and an exclusive retreat house and office for fellows of this program to use, if so desired.
[Yes, this is the "pay Terry Tao $10M" thing. FAQ in a GDoc here.]