Honoring Petrov Day on LessWrong, in 2019

Ben Pace

LESSWRONG
LW

Honoring Petrov Day on LessWrong, in 2019

by Ben Pace

5 min read26th Sep 2019168 comments

137

Petrov DayCommunity

Frontpage

Just after midnight last night, 125 LessWrong users received the following email.

Subject Line: Honoring Petrov Day: I am trusting you with the launch codes

Dear {{username}},

Every Petrov Day, we practice not destroying the world. One particular way to do this is to practice the virtue of not taking unilateralist action.

It’s difficult to know who can be trusted, but today I have selected a group of LessWrong users who I think I can rely on in this way. You’ve all been given the opportunity to show yourselves capable and trustworthy.

This Petrov Day, between midnight and midnight PST, if you, {{username}}, enter the launch codes below on LessWrong, the Frontpage will go down for 24 hours.

Personalised launch code: {{codes}}

I hope to see you on the other side of this, with our honor intact.

Yours, Ben Pace & the LessWrong 2.0 Team

P.S. Here is the on-site announcement.

Unilateralist Action

As Nick Bostrom has observed, society is making it cheaper and easier for small groups to end the world. We’re lucky it requires major initiatives to build a nuclear bomb, and that the world can’t be destroyed by putting sand in a microwave.

However, other dangerous technologies are becoming widely available, especially in the domain of artificial intelligence. Only 6 months after OpenAI created the state-of-the-art language-modelling GPT-2, others created similarly powerful versions and released them to the public. They disagreed about the dangers, and, because there was nothing stopping them, moved ahead.

I don’t think this example is at all catastrophic, but I worry what this suggests about the future, when people will still have honest disagreements about the consequences of an action but where those consequences will be much worse.

And honest disagreements will happen. In the 1940s, the great physicist Niels Bohr met President Roosevelt and Prime Minister Churchill, to persuade them to give the instructions for building the atomic bomb to Russia. He wanted to bring in a new world order and establish global peace, and thought this would be necessary - he believed strongly that it would prevent arms race dynamics, if only everyone just shared their science. (Churchill did not allow it.) Our newest technologies technologies do not yet have the bomb’s ability to transform the world in minutes, but I think it’s likely we’ll make powerful discoveries in the coming decades, and that publishing those discoveries will not require the permission of a president.

And then it will only take one person to end the world. Even in a group of well-intentioned people, natural disagreements will mean someone will think that taking a damaging action is actually the correct choice — Nick Bostrom calls this the “unilateralist’s curse”. In a world where dangerous technology is widely available, the greatest risk is unilateralist action.

Not Destroying the World

Stanislav Petrov once chose not to destroy the world.

As a Lieutenant Colonel of the Soviet Army, Petrov manned the system built to detect whether the US government had fired nuclear weapons on Russia. On September 26th, 1983, the system reported multiple such attacks. Petrov’s job was to report this as an attack to his superiors, who would launch a retaliative nuclear response. But instead, contrary to all the evidence the systems were giving him, he called it in as a false alarm. This later turned out to be correct.

(For a more detailed story of how Stanislav Petrov saved the world, see the original LessWrong post by Eliezer, which started the tradition of Petrov Day.)

During the Cold War, many other people had the ability to end the world - presidents, generals, commanders of nuclear subs from many countries, and so on. Fortunately, none of them did. As the number of people with the ability to end the world increases, so too does the standard to which we must hold ourselves. We lived up to our responsibilities in the cold war, but barely. (The Global Catastrophic Risks Institute has compiled an excellent list of 60 close calls.)

Petrov Day

On Petrov Day, we try to live to up to this responsibility - we celebrate by not destroying the world.

Raymond Arnold has suggested many ways of observing Petrov Day. You can discuss it with your friends. You can hold a quiet, dignified ceremony (for example, with the beautiful booklet Jim Babcock created). But you can also play on hard mode: "During said ceremony, unveil a large red button. If anybody presses the button, the ceremony is over. Go home. Do not speak."

In the comments of Ray's post, Zvi asked the following question (about a variant where a cake gets destroyed):

I still don't understand, in the context of the ceremony, what would cause anyone to push the button. Whether or not it would incinerate a cake, which would pretty much make you history's greatest monster.

To which I replied:

The point isn't that anyone sane would push the button. It's that we, as a civilisation, are just going around building buttons (cf. nukes, AGI, etc) and so it's good practice to put ourselves in the situation where any unilateralist can destroy something we all truly value. When I said the above, I was justifying why it was useful to have a ritual around Petrov Day, not why you would press the button. I can't think of any good reason to press the button, and would be angry at anyone who did - they're just decreasing trust and increasing fear of unilateralists. We still should have a ceremony where we all practice the art of sitting together and not pressing the button.

So this year on LessWrong, I thought we'd build ourselves a big red button. Instead of making everyone go home, this button (which you can find over the frontpage map) will shut down the Less Wrong frontpage for 24 hours.

Now, this isn't a button for anyone. I know there are people with an internet access who will happily press buttons that do bad things. So today, I've emailed personalised launch codes to 125 LessWrong users, for us to practice the art of sitting together and not pressing harmful buttons[1]. If any users do submit a set of launch codes, tomorrow I’ll publish their username, and whose launch codes they were.

During Thursday 26th September, we will see whether the people with the codes can be trusted to not, unilaterally, destroy something valuable.

To all here on LessWrong today, I wish you a safe and stable Petrov Day.

Footnotes

[1] I picked the list quickly on Tuesday, mostly leaving out users I don’t really know, and a few people who I thought would press it (e.g. someone who has said in the past that they would). If this goes well we may do it again next year, with an expanded pool or more principled selection criteria. Though I think this is still a representative set - out of the 100+ users with over 1,000 karma who've logged in to LessWrong in the past month, the list includes 53% of them.

Added: Follow-Up to Petrov Day, 2019.

Petrov DayCommunity

Frontpage

137

Mentioned in

121LW Petrov Day 2022 (Monday, 9/26)

117Honoring Petrov Day on LessWrong, in 2020

97Postmortem to Petrov Day, 2020

97Selective, Corrective, Structural: Three Ways of Making Social Systems Work

80Petrov Day Retrospective: 2021

Load More (5/12)

Honoring Petrov Day on LessWrong, in 2019

46lionhearted (Sebastian Marshall)

39Scott Garrabrant

20lionhearted (Sebastian Marshall)

1Pattern

25Said Achmiz

23Gordon Seidoh Worley

1lionhearted (Sebastian Marshall)

19jmh

12habryka

2jmh

4lionhearted (Sebastian Marshall)

-3lionhearted (Sebastian Marshall)

1lionhearted (Sebastian Marshall)

23lionhearted (Sebastian Marshall)

28tcheasdfjkl

22Larks

5jefftk

19the gears to ascension

5jefftk

27the gears to ascension

17lionhearted (Sebastian Marshall)

13lionhearted (Sebastian Marshall)

11Rohin Shah

14lionhearted (Sebastian Marshall)

-2lionhearted (Sebastian Marshall)

24Rohin Shah

-2lionhearted (Sebastian Marshall)

8Rohin Shah

-3lionhearted (Sebastian Marshall)

5lionhearted (Sebastian Marshall)

14Adele Lopez

6lionhearted (Sebastian Marshall)

14lionhearted (Sebastian Marshall)

2lionhearted (Sebastian Marshall)

1lionhearted (Sebastian Marshall)

10lionhearted (Sebastian Marshall)

5lionhearted (Sebastian Marshall)

New Comment

168 comments, sorted by

top scoring

Click to highlight new comments since: Today at 11:03 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]quanticle5y490

In a world where dangerous technology is widely available, the greatest risk is unilateralist action.

What Stanislav Petrov did was just as unilateralist as any of the examples linked in the OP. We must remember that when he chose to disregard the missile alert (based off his own intuition regarding the geopolitics of the world), he was violating direct orders. Yes, in this case everything turned out great, but let's think about the counterfactual scenario where the missile attack had been real. Stanislav Petrov would potentially have been on the hook for more deaths than Hitler and the utter destruction of his nation.

A unilateral choice not to act is as much of a unilateral choice as a unilateral choice to act.

[-]Matthew Barnett5y190

If one nation is confident that a rival nation will not retaliate in a nuclear conflict, then the selfish choice is to strike. By refusing orders, Petrov was being the type of agent who would not retaliate in a conflict. Therefore, in a certain sense, by being that type of agent, he arguably raised the risk of a nuclear strike. [Note: I still think his decision to not retaliate was the correct choice]

[-]quanticle5y270

Petrov's choice was obviously the correct one in hindsight. What I'm questioning is whether Petrov's choice was obviously correct in foresight. The rationality community takes as a given Petrov's assertion that it was obviously silly for the United States to attack the Soviet Union with a single ICBM. Was that actually as silly as Petrov suggested? There were scenarios where small numbers of ICBMs were launched in a surprise attack against an unsuspecting adversary in order to kill leadership, and disrupt command and control systems. How confident was Petrov that this was not one of those scenarios?

Another assumption that the community makes is that Petrov choosing to report the detection would have immediately resulted in a nuclear "counterattack" by the Soviet Union. But Petrov was not a launch authority. The decision to launch or not was not up to him, it was up to the Politburo of the Soviet Union. We have to remember that when he chose to lie about the detection, by calling it a computer glitch when he didn't know for certain that it was one, Petrov was defecting against the system. He was deliberately feeding false data to his superiors, betting that his model of the world was more accurate than his commanders'. Is that the sort of behavior we really want to lionize?

[-]Aiyen5y270

But Petrov was not a launch authority. The decision to launch or not was not up to him, it was up to the Politburo of the Soviet Union.

This is obviously true in terms of Soviet policy, but it sounds like you're making a moral claim. That the Politburo was morally entitled to decide whether or not to launch, and that no one else had that right. This is extremely questionable, to put it mildly.

We have to remember that when he chose to lie about the detection, by calling it a computer glitch when he didn't know for certain that it was one, Petrov was defecting against the system.

Indeed. But we do not cooperate in prisoners' dilemmas "just because"; we cooperate because doing so leads to higher utility. Petrov's defection led to a better outcome for every single person on the planet; assuming this was wrong because it was defection is an example of the non-central fallacy.

Is that the sort of behavior we really want to lionize?

If you will not honor literally saving the world, what will you honor? If we wanted to make a case against Petrov, we could say that by demonstrably not retaliating, he weakened deterrence (but deterrence would have ... (read more)

[-]quanticle5y330

If you will not honor literally saving the world, what will you honor?

I find it extremely troubling that we're honoring someone defecting against their side in a matter as serious as global nuclear war, merely because in this case, the outcome happened to be good.

(but deterrence would have helped no one if he had launched)

That is exactly the crux of my disagreement. We act as if there were a direct lever between Petrov and the keys and buttons that launch a retaliatory counterstrike. But there wasn't. There were other people in the chain of command. There were other sensors. Do we really find it that difficult to believe that the Soviets would not have attempted to verify Petrov's claim before retaliating? That there would not have been practiced procedures to carry out this verification? From what I've read of the Soviet Union, their systems of positive control were far ahead of the United States' as a result of the much lower level of trust the Soviet Politburo had in their military. I find it exceedingly unlikely that the Soviets would have launched without conducting at least some kind of verification with a secondary system. They knew the consequences of nuclear attack

... (read more)

[-]jmh5y150

I'm not entirely sure we can ever have a correct choice in foresight.

With regard to Petrov, he did seem to make a good, and reasoned call: The US launching a first strike with 5 missiles just does not make much sense without some very serious assumptions that don't seem to be merited.

I do like the observation that Petrov was being just as unilateralist as what is feared in this thread.

Do we want to lionize such behavior? Perhaps. You argument seems to lend itself to the lens of an AI problem -- and Petrov's behavior then a control on that AI.

1quanticle5y

I also think it's weird that The Sequences, Thinking Fast and Slow, and other rationalist works such as Good and Real all emphasize gathering data and trusting data over intuition, because human intuition is fallible, subject to bias, taken in by narratives, etc... and then we're celebrating someone who did the opposite of all that and got away with it. The steelman interpretation is that Petrov made a Bayesian assessment, starting with a prior that a nuclear attack (and especially a nuclear attack with five missiles) was an extremely unlikely scenario, and appropriately discounted the evidence being given to him by the satellite detection system because the detection system was new and therefore prone to false alarms, and found that the posterior probability of attack did not justify his passing the attack warning on. However, this seems to me like a post-hoc justification of a decision that was made on intuition.

5TurnTrout5y

He thought it unlikely that the US would launch a strike with 5 ICBMs only, since a first strike would likely be comprehensive. As far as Bayesian reasoning goes, this seems pretty good. Also, a big part of being good at Bayesian reasoning is refining your ability to reason even when you can't gather data, when you can't view the same scenario "redrawn" ten thousand times and gather statistics on it. ETA: the satellite radar operators reported all-clear; however, instructions were to only make decisions based on the computer readouts.

1quanticle5y

I've replied below with a similar question, but do you have a source on "satellite radar operators"? The published accounts of the incident imply that Petrov was the satellite radar operator. He followed up with the operators of the ground-based radar later, but at the time he made the decision to stay silent, he had no data that contradicted what the satellite sensors were saying. As far as the Bayesian justification goes, I think this is bottom-line reasoning. We're starting with, "Petrov made a good decision," and looking backwards in order to find reasons as to why his reasoning was reasonable and justifiable.

5TurnTrout5y

I don’t see why this is bottom-line reasoning. It is in fact implausible that the US would first-strike with only five missiles, as that would leave the USSR able to respond.

[-]Ben Pace5y120

To quote Stanislav himself:

I imagined if I'd assume the responsibility for unleashing the third World War...

...and I said, no, I wouldn't. ... I always thought of it. Whenever I came on duty, I always refreshed it in my memory.

I don't think it's obvious that Petrov's choice was correct in foresight, I think he didn't know whether it was a false alarm - my current understanding is that he just didn't want to destroy the world, and that's why he disobeyed his orders. It's a fascinating historical case where someone actually got to make the choice, and made the right one. Real world situations are messy and it's hard to say exactly what his reasoning process is and how justifiable it was - it's really bad like decisions like these have to be made, and it doesn't seem likely to me there's some simple decision rule that gets the right answer in all situations (or even most). I didn't make any explicit claim about his reasoning in the post. I simply celebrate that he managed to make the correct choice.

The rationality community takes as a given Petrov's assertion that it was obviously silly for the United States to

... (read more)

[-]Ruby5y230

I think we can celebrate that Petrov didn't want to destroy the world and this was a good impulse on his part. I think if we think it's doubtful that he made the correct decision, or that it's complicated, then we should be very, very upfront about that (your comment is upfront, the OP didn't make this fact stick with me). The fact the holiday is named after him made me think (implicitly if not explicitly) that people (including you, Ben) generally endorsed Petrov's reasoning/actions/etc. and so I did take the whole celebration as a claim about his reasoning. I mean, if Petrov reasoned poorly but happened to get a good result, we should celebrate the result yet condemn Petrov (or at least his reasoning). If Petrov reasoned poorly and took actions there were poor in expectation, doesn't that mean something like in the majority of world's Petrov caused bad stuff to happen (or at the algorithm which is Petrov generally would)?

. . .

I think it is extremely extremely weird to make a holiday about avoiding unilateralist's curse and name it after who did exactly that. I hadn't thought about it, but if Quanticle is right, then Petrov was taking ... (read more)

4philh5y

FWIW, I had taken that as a given.

[-]Ben Pace5y160

Indeed.

Perhaps the key problem with attempts to lift the unilateralist's curse, is that it's very easy to enforce dangerous conformity - 'conformity' being a term I made sure not to use in the OP. It's crucial to be able to not do the thing that you're being told to do under the threat of immediate and strong social punishment, especially when there's a long time scale before finding out if your action is actually the right one. Consistently going against the grain because it's better in the long run, not because it brings immediate reward, is very difficult.

Both being able to think and act for yourself, and yet also not disregard others enough to not break things, is a delicate balance, and many people end up too far on one end or the other. They find themselves punished for unilateralist action, and never speak up again; or they find that others are stopping them from being themselves, and then ignore all the costs they're imposing on their community. My current sense is that most people lean towards conformity, but also that the small number of unilateralists have caused an outsized harm.

(Then again, failures from conformity are often more silent, so I have wide error bars around the magnitude of their cost.)

8Rob Bensinger5y

Seems like unilateralism and coordination failure is a good way of summing up humanity's general plight re nuclear weapons, which makes it relevant to a day called "Petrov Day" in a high-level way. Putting the emphasis here makes the holiday feel more like "a holiday about x-risk and a thanksgiving for our not having died to nuclear war", and less like "a holiday about the virtues of Stanislav Petrov and emulating his conduct". If Petrov's decision was correct, or incorrect-but-reflecting-good-virtues, the relevant virtue is something like "heroic responsibility", not "refusal to be a unilateralist". I could imagine a holiday that focuses instead on heroic responsibility, or that has a dual focus. ('Lord, grant me the humility to cooperate in good equilibria, the audacity to defect from bad ones, and the wisdom to know the difference.') I'm not sure which of these options is most useful.

5quanticle5y

Well, that's one of the questions I'm raising. I'm not sure we want to encourage more "heroic responsibility" with AI technologies. Do we want someone like Stanislav Petrov to decide, "No, the warnings are false, and the AI is safe after all," and release a potentially unfriendly general AI? I would much rather not have AI at all than have it in the hands of someone who decides without consultation that their instruments are lying to them and that they know the correct thing to do based upon their judgment and intuition alone.

4TurnTrout5y

Petrov did consult with the satellite radar operators, who said they detected nothing.

2quanticle5y

Do you have a source on Petrov consulting the radar operators? The Wikipedia article on the 1983 incident seems to imply that he did not. From the passage above, it seems like, at the time of the decision, Petrov had no way of confirming whether the missile launches were real or not. He decided that the missile launch warnings were the result of equipment malfunction, and then followed up with land-based radar operators later to confirm that his decision had been correct.

4TurnTrout5y

1Idan Arye4y

Petrov's choice was not about dismissing warnings, it's about picking on which side to err. Wrongfully alerting his superiors could cause a nuclear war, and wrongfully not alerting them would disadvantage his country in the nuclear war that just started. I'm not saying he did all the numbers, used Bayes's law to figure the probability there is an actual nuclear attack going on, assigned utilities to all four cases and performed the final decision theory calculations - but his reasoning did take into account the possibility of error both ways. Though... it does seem like his intuition gave utility much more weight than to probabilities. So, if we take that rule for deciding what to do with a AGI, it won't be just "ignore everything the instruments are saying" but "weight the dangers of UFAI against the missed opportunities from not releasing it". Which means the UFAI only needs to convince such a gatekeeper that releasing it is the only way to prevent a catastrophe, without having to convince the gatekeeper that the probabilities of the catastrophe are high or that the probabilities of the AI being unfriently are low.

7Slider5y

Doctor Strangelove althought being fictious evidence presents a unilateralist choice to act. US nuke bomber commander just screws the upward chain of command and goes to bomb soviets. The case that that the decision to nuke is the presidents to make is way stronger and more intuitive there.

2quanticle5y

No, that's not what happens in Dr. Strangelove at all. In Dr. Strangelove, a legitimate launch order is given, the bombers take off, and then, while they're on their way to their destination, the launch order is rescinded. However, the one bomber (due to equipment failure, I think), fails to receive the retraction of the launch order. The President, realizing that this bomber did not receive the order to turn back, authorizes the Soviets to shoot down the plane. The Soviets, however, are unable to do so, as the bomber has diverted from its primary target and is heading towards a nearer secondary target. The bomber crew, following their orders to the letter, undertake heroic efforts to get their bomb operational and drop it, even though that means sacrificing their commander. In a sense, Dr. Strangelove is the very opposite of what Stanislav Petrov did. Rather than save humanity by disobeying orders, the crew dooms humanity by following its orders.

1Slider5y

The downward chain of command holds appropriately but a person (I think the character is named Jack D Ripper) that shouldn't be making such a call is in a factual position to act as if he had received one. Part of the point is that it is surprising and that the remedy to have them court martialled is not comforting at all. Yes he does not personally go to nuke the soviets but he acts on his own without cooperation with the powers invested in him. The points do not need to be in conflict. Ripper can doom the humanity by doing unauthorised things while the bomber crew dooms them by doing authorised things. The bomber crew equivalents also kept the cold war cold because it was plausible that they could be used for the their trained purpose.

2Ben Pace5y

Just FYI, I am planning to make another post in maybe two weeks to open further discussion to needle down the specific details of what we want to celebrate and what is a fitting way to do that, because that seems like the correct way to build traditions.

-2clone of saturn5y

Good point. We're unlucky that nuclear war didn't break out in 1983.

9quanticle5y

In 1983, Moscow was protected by the A-35 anti-ballistic missile system. This system was (in theory) capable of stopping either a single ICBM or six Pershing II IRBMs from West Germany. The threat that Petrov's computers reported was a single ICBM, coming from the United States. If the threat had been real, Petrov's actions would have prevented the timely activation of the ABM system, preventing the Soviets from even trying to shoot down the incoming nuke.

[-]ChristianKl5y120

It seems like the official story as you for example find it on Wikipedia says that the system detected five ICBMs.

4quanticle5y

From https://en.wikipedia.org/wiki/1983_Soviet_nuclear_false_alarm_incident: The initial detection was one missile. Petrov dismissed this as a false alarm. Later four more missiles were detected, and Petrov also dismissed this as a false alarm. Other accounts combine both sub-incidents together and say that five missiles were detected. I choose to focus on the first detection because that's when Petrov made the critical decision, in my mind, to not trust the satellite early warning network. The second detection of four missiles isn't as important to me, because at that point Petrov has already chosen to disregard warnings from the satellite network.

[-]lionhearted (Sebastian Marshall)5y460

Oh this is wild. This generated a strange emotion.

Anyone here know the word "Angespannt"? One of my team members taught, German word with no exact English equivalent. We talked about it —

https://www.ultraworking.com/podcast/big-project-angespannt

"It's a mix of tense and alert in a way. It's like the feeling you get before you go on stage."

Like, why should I care? I'm obviously not going to press the damn thing. And yet, simply knowing the button is there generates some tension and alertness.

Fascinating. Thank you for doing this.

(Well, sort of thank you, to be more precise...)

[-]Scott Garrabrant5y390

If any users do submit a set of launch codes, tomorrow I’ll publish their identifying details.

If we make it through this, here are some ideas to make it more realistic next year:

1) Anonymous codes.

2) Karma bounty for the first person to press the button.

1+2) Randomly and publicly give some people the same code as each other, and give a karma bounty to everyone who had the code that took down the site.

3) Anyone with button rights can share button rights with anyone, and a karma bounty for sharing with the most other people that only pays out if nobody presses the button.

[-]lionhearted (Sebastian Marshall)5y200

Or, if we want to go all max-Schelling at the risk of veering almost into Stalinism, tell people they'll get a karma bounty for pressing it but then coordinate with LW, CFAR, MIRI, and various meetups to ban that person for life from everything if the actually do it. 😂

1Pattern5y

3 seems like an incentive to create sockpuppets. (It might make more sense to combine "Button rights" and "codes".) Making limitations, for example, based on age of accounts moves the incentive from "create sockpuppets", to "have created sockpuppets".

[-]Said Achmiz5y250

Well, it seems that no one has launched anything. However, skimming through the comments seems to indicate that this may at least partly be due to folks simply not having had enough time to coordinate any agreements about launching for some quid pro quo, or blackmail, or whatever. And, for that matter, not everyone has time to visit the site daily—I’d wager that at least some of the people who had launch codes, simply didn’t have time to go to Less Wrong all day, or forgot, etc.

Perhaps, next time, there can be more warning? Send out the launch codes a week in advance, let’s say (though maintain only a one-day window for actually using them).

That way, we can be more certain of whether the outcome was due entirely to trustworthiness, self-restraint, and a cooperative spirit, or whether it was instead due to indecisiveness and the limitations of people’s busy schedules.

[-]Gordon Seidoh Worley5y230

the temptation, the call to infamy

button shining, yearning to be pressed

can we endure these sinuous fingers coiled?

only the hours know our hearts

1lionhearted (Sebastian Marshall)5y

Upvoted for poetry. Commenting to underline it for "the call to infamy" — wonderful phrase.

[-]jmh5y190

I was not aware of this story and happy to hear it. While I think having the day of celebration and rememberance should be done, I wonder about the exercise with the button.

First, just not pushing the button and bring the page down for a day seems not to fit the problem. The button should be shutting down someone else's site with the realization that they will have some knowledge of that coming and have a button that shuts your page down. Perhaps next year the game could include other sites, and particularly sites whose members do not really see eye-to-eye on things.

Second, it doesn't really tell others much about avoiding such situations. Reading Eliezer's post the critical insight for me seems to be that of remaining calm and taking the time available to think a bit rather than merely react and follow instructions of a mindless process. That Petrov realized that launching 5 missiles just made no sense, so came to the conclusion that there was a system error/false positive is critical here.

[-]habryka5y120

We had some original plans of coordinating with the EA Forum people on this, but didn't end up having enough time available to make all of that happen. Agree that the ideal reenactment scenario would include two forums (though with mutually assured destruction in the later parts of the cold war, the outcome is ultimately the same).

2jmh5y

A slightly different thought that might be easier to coordinate. Have the button hide all the comments of a specific user on LW -- adds the variance that the thread is not merely bilateral. We could also add something that might obscure the actor, thought not entirely hide their action. Additionally, we could have the button delete a selected subset of comments/posts allowing a scenario where one needs to decide if an all out attack was launched or something else is going on. That seems to be what Petrov faced. I would also add something that produced an almost identical signal even if no one pushed their button. Though, now it's becoming more like a war game on LW than simply noting a (at least I think) positive event in history. Still, we might make it a good experiment and see what can be learned. Maybe I'm in a dark mindset here.... Seems like today, even with (due to?) the advances in weapons and other technology that MAD assumption may no longer be believed. I recall Putin claiming Russia would in fact survive an all out war with the USA. I wonder how much that view might change the way the game plays out. On a tangent here, part of the concern is the proliferation of the technology. What would a Guarantee Assured Destruction (GAD) policy be for any country/group seeking such technology? Is that a better world than what we have now?

4lionhearted (Sebastian Marshall)5y

If you've got launch codes, wait until tomorrow to read this eh? — Lbhe pbzzrag znxrf zr jnag gb chfu gur ohggba. uggcf://jjj.yrffjebat.pbz/cbfgf/hkLrzN2ggmwf8m8Ro/qevir-ol-ybj-rssbeg-pevgvpvfz uggcf://jjj.yrffjebat.pbz/cbfgf/TT2egOErNz6b3zega/qrsrpgvat-ol-nppvqrag-n-synj-pbzzba-gb-nanylgvpny-crbcyr V'z abg n zbq be nssvyvngrq va nal jnl, whfg pevgvpvmvat crbcyr qbvat n tbbq guvat jura jr'er nyy abzvanyyl ba gur fnzr grnz qevirf zr penml. Ohg gura ntnva, znlor gung'f gur Xvffvatre evtugrbhfarff dhbgr ntnva.

[-]Said Achmiz5y170

In the comments of Ray’s post, Zvi asked the following question (about a variant where a cake gets destroyed):

I still don’t understand, in the context of the ceremony, what would cause anyone to push the button. Whether or not it would incinerate a cake, which would pretty much make you history’s greatest monster.

There are several obvious reasons why someone might push the button.

Reason one: spite. Pure, simple spite, nothing more. A very compelling reason, I assure you. (See also: “Some men just want to watch the world burn.”)

Reason two: desire for infamy. “History’s greatest monster” is much better (for many people) than being a nobody.

Reason three: personal antipathy for people who would be harmed.

I could think of more potential reasons, I suppose, but I think three examples are enough. Remember that being incapable of imagining why someone would do a bad thing, is a weakness and a failure. Strive to do better.

[-]jacobjacob5y220

All your reasons look like People Are Bad. I think it suffices that The World is Complex and Coordination is Hard.

Consider, for example:

Someone thinks Petrov day is not actually a good ritual and wants to make a statement about this
Someone thinks the reasoning exhibited in OP/comments is naïve and wouldn't stand up to the real test, and so wants to punish people/teach them a lesson about this
Someone comes up with a clever argument involving logical decision theories and Everett branches meaning they should push... but they made a mistake and the argument is wrong
Someone thinks promising but unstable person X is about to press the button, and that this would be really bad for X's community position, and so instead they take it upon themselves to press the button to enable the promising but unstable person to be redeemed and flourish
Someone accidentally gives away/loses their launch codes (e.g. just keeps their gmail inbox open at work)
A group of people tries to set up a scheme to reliable prevent a launch, however this grows increasingly hard and confusing and eventually escalates into one of the above failure modes
Several people try to set up precommitments that wi

... (read more)

9Said Achmiz5y

I disagree, FWIW. It seems to me that “desire for infamy” may be rolled into “people are bad”, but not the other two. I do not consider either personal antipathy nor spite to be necessarily negative qualities.

3gjm5y

I would be interested to know how you see spite as "not necessarily negative".

[-]Said Achmiz5y180

Well, I could note that reactive spite is game-theoretically correct; this is well-documented and surely familiar to everyone here.

But that would not be the important reason. In fact I take spitefulness to be a terminal value, and as a shard of godshatter which is absolutely critical to what humans are (and, importantly, what I take to be the ideal of what humans are and should be).

It is not always appropriate, of course; nor even usually, no. Someone who is spiteful all or most of the time, who is largely driven by spite in their lives—this is not a pleasant person to be around, and nor would I wish to be like this. But someone who is entirely devoid of spite—who does not even understand it, who has never felt it nor can imagine feeling spite—I must wonder whether such a one is fully human.

There is an old Soviet animated short, called “Baba Yaga Is Opposed” (which you may watch in its entirety on YouTube; link to first of three episodes; each is ~10 minutes).

The plot is: it’s the 1980 Olympics in Moscow. Misha the bear has been chosen as the event’s mascot. Baba Yaga—the legendary witch-crone of Russian folklore—is watching the announcement on TV. “Why him!” she exclaims; “why him

... (read more)

3quanticle5y

Indeed, Eliezer has written extensively about this very phenomenon. No argument is universally compelling -- there is no sequence of propositions so self evident that it will cause our opponents to either agree or spontaneously combust.

-3lionhearted (Sebastian Marshall)5y

Great comment. Side note, I occasionally make a joke that I'm sent from another part of the multiverse (colloquially, "the future") to help fix this broken fucked up instance of the universe. The joke goes — it's not a stupid teleportation thing like Terminator, it's a really expensive process two-step process to edit even a tiny bit of information in another universe. So with right CTC relays you can edit a tiny bit of information, creating some high-variance people in a dense area, and then the only people who get their orders are people who reach a sufficient level of maturity, competence, and duty. Not everyone who we give the evolved post-sapien genetics gets their orders; the overwhelming majority fail actually. Now, the reason we at the Agency — in the joke, I'm on the Solar Task Force — are trying to fix this universe is because it effects other parts of the multiverse. There's a lot of stuff, but here's a simple one — the coordinates of Earth are similar in many branches. Setting off tons of nukes and beaming random stuff into space calls attention to Earth's location. I believe a game theoretic solution to the Fermi Pardox was proposed recently in SciFi and no one was paying attention. I mean, did anyone check that out? Right? Don't let Earth's coordinates get out. Jeez guys. This isn't complicatd. C'mon. Now normally things work correctly, but this particular universe came about because you idiots — I mean, not you since you weren't alive — but collectively, this idiot branch of humans took a homeless bohemian artist who was a kinda-brave messenger solider in World War One (already a disaster but then the error compounds) and they took this loser with a bad attitude and put him in charge of a major industrial power at one of the most leveraged moments in human history. He wasn't even German! He was Austrian! And he took over the Nazi Party as only the 55th member after he was sent in as a police officer to watch the group. (Look it up on Wikipedia, is

1lionhearted (Sebastian Marshall)5y

Oh in case you missed the subtext, it's a SciFi joke. It's funny cuz it's sort of almost plausibly true and gets people thinking about what if their life had higher stakes and their decisions mattered, eh? Obviously, it's just a silly amusing joke. And it's obviously going to look really counterproductively weird if analyzed or discussed among normal people, since they don't get nerd humor. I recommend against doing that. Just laugh and maybe learn something. Don't be stupid and overthink it.

2Alexei5y

I’m confused why you got downvoted so much over a joke.... sorry.

4eigen5y

The fact that it's a joke is non-important; the fact that it's a bad joke is. Maybe don't make a bad joke and think that people cannot take it, consider that maybe it's just bad.

[-]jefftk5y150

[EDIT: two people with codes below have objected, so I'm not up for this trade anymore, unless we figure out a way to make a broader poll]

I have launch codes. Would anyone be interested in offering counterfactual donations to https://www.givewell.org/charities/amf? I could also be interested in counterfactual donations to nuclear war-prevention organizations.

[-]Raemon5y370

oh geez

[-]lionhearted (Sebastian Marshall)5y230

"Rae, this is a friendly reminder from the universe that you can only at best control the first-order effects of systems you create..."

[-]tcheasdfjkl5y280

Since the day is drawing to a close and at this point I won’t get to do the thing I wanted to do, here are some scattered thoughts about this thing.

First, my plan upon obtaining the code was to immediately repeat Jeff’s offer. I was curious how many times we could iterate this; I had in fact found another person who was potentially interested in being another link in this chain (and who was also more interested in repeating the offer than nuking the site). I told Jeff this privately but didn’t want to post it publicly (reasons: thought it would be more fun if this was a surprise; didn’t think people should put that much weight on my claimed intentions anyway; thought it was valuable for the conversation to proceed as though nuking were the likely outcome).

(In the event that nobody took me up on the offer, I still wasn’t going to nuke the site.)

Other various thoughts:

Having talked to some people who take this exercise very seriously indeed and some who don’t understand why anyone takes it seriously at all, both perspectives make a lot of sense to me and yet I’m having trouble explaining either one to the other. Probably I should pract

... (read more)

[-]Larks5y220

I would like to add that I think this is bad (and have the codes). We are trying to build social norms around not destroying the world; you are blithely defecting against that.

5jefftk5y

I'm not doing anything unilaterally. If I do anything at this point it will be after some sort of fair polling.

[-]the gears to ascension5y190

This seems extremely unprincipled of you :/

5jefftk5y

Clarify?

[-]the gears to ascension5y270

I thought you were threatening extortion. As it is, given that people are being challenged to uphold morality, this response is still an offer to throw that away in exchange for money, under the claim that it's moral because of some distant effect. I'd encourage you to follow Jai's example and simply delete your launch codes.

9tcheasdfjkl5y

yesssss shenanigans

6Peter Wildeford5y

Are you offering to take donations in exchange for pressing the button or not pressing the button?

7jefftk5y

I would give someone my launch codes in exchange for a sufficiently large counterfactual donation. I haven't thought seriously about how large it would need to be, because I don't expect someone to take me up on this, but if you're interested we can talk.

1Ramiro P.5y

I thought he was being ambiguous on purpose, so as to maximize donations.

1William_S5y

I think the better version of this strategy would involve getting competing donations from both sides, using some weighting of total donations for/against pushing the button to set a probability of pressing the button, and tweaking the weighting of the donations such that you expect the probability of pressing the button will be low (because pressing the button threatens to lower the probability of future games of this kind, this is an iterated game rather than a one-shot).

[-]Jacob Falkovich5y120

Agreed. I have launch codes and will donate up to $100 without writing it in my EA budget if that prevents the nuke from being launched.

[-]lionhearted (Sebastian Marshall)5y170

Nooooo you're a good person but you're promoting negotiating with terrorists literally boo negative valence emotivism to highlight third-order effects, boo, noooooo................

7Jacob Falkovich5y

As they say in the KGB, one man's nuclear terrorism is another man's charity game show.

1Gurkenglas5y

Participants were selected based on whether they seem unlikely to press the button, so whoever would have cared about future extortions being possible CDT-doesn't need to, because they won't be a part of it.

5tcheasdfjkl5y

hey actually I'm potentially interested depending on what size of donation you would consider sufficient, can you give an estimate?

4jefftk5y

Maybe a fair value would be GiveWell's best guess cost per life saved equivalent? [1] There's some harm in releasing the codes entrusted to me, but not so much that it's better for someone to die. I would want your assurance that it really was a counterfactually valid donation, though: money you would otherwise spend selfishly, and that you would not consider part of your altruistic impact on the world. If two other people with launch codes tell me they don't think this is a good trade then I'll retract the offer. [1] https://www.givewell.org/how-we-work/our-criteria/cost-effectiveness/cost-effectiveness-models gives $1,672.

[-]CarlShulman5y480

I have launch codes and don't think this is good. Specifically, I think it's bad.

[-]Scott Garrabrant5y150

Did you consider the unilateralist curse before making this comment?

Do you consider it to be a bad idea if you condition the assumption that only one other person with launch access who sees this post in the time window choose to say it was a bad idea?

5jefftk5y

Is the objection over the amount (there's a higher number where it would be a good trade), being skeptical of the counterfactuality of the donation (would the money really be spent fully selfishly?), or something else?

[-]Raemon5y400

(others have said part of what I wanted to say, but didn't quite cover the thing I was worried about)

I see two potential objections:

how valuable is trust among LW users? (this is hard to quantify, but I think it is potentially quite high)
how persuasive should "it's better than for someone to die" type arguments.

My immediate thoughts are mostly about the second argument.

I think it's quite dangerous to leave oneself vulnerable to the second argument (for reasons Julia discusses on givinggladly.com in various posts). Yes, you can reflect upon whether every given cup of coffee is worth the dead-child-currency it took to buy it. But taken naively this is emotionally cognitively exhausting. (It also pushes people towards a kind of frugality that isn't actually that beneficial). The strategy of "set aside a budget for charity, based on your values, and don't feel pressure to give more after that" seems really important for living sanely while altruistic.

(I don't have a robustly satisfying answer on how to deal with that exactly, but see this comment of mine for some more expanded thoughts of mine on this)

Now, additional counterfactual don... (read more)

[-]Rohin Shah5y240

The strategy of "set aside a budget for charity, based on your values, and don't feel pressure to give more after that" seems really important for living sanely while altruistic.

But this situation isn't like that.

I agree you don't want to always be vulnerable to the second argument, for the reasons you give. I don't think the appropriate response is to be so hard-set in your ways that you can't take advantage of new opportunities that arise. You can in fact compare whether or not a particular trade is worth it if the situation calls for it, and a one-time situation that has an upside of $1672 for ~no work seems like such a situation.

As a meta point directed more at the general conversation than this comment in particular, I would really like it if people stated monetary values at which they would think this was a good idea. At $10, I'm at "obviously not", and at $1 million, I'm at "obviously yes". I think the range of uncertainty is something like $500 - $20,000. Currently it feels like the building of trust is being treated as a sacred value; this seems bad.

[-]habryka5y240

My sense is that it's very unlikely to be worth it at anything below $10k, and I might be a bit tempted at around $50k, though still quite hesitant. I agree that at $1M it's very likely worth it.

[-]lionhearted (Sebastian Marshall)5y130

Firm disagree. Second-order and third-order effects go limit->infinity here.

Also btw, I'm running a startup that's now looking at — best case scenario — handling significant amounts of money over multiple years.

It makes me realize that "a lot of money" on the individual level is a terrible heuristic. Seriously, it's hard to get one's mind around it, but a million dollars is decidedly not a lot of money on the global scale.

For further elaboration, this is relevant and incredibly timely:

https://slatestarcodex.com/2019/09/18/too-much-dark-money-in-almonds/

[-]Rohin Shah5y110

LW frontpage going down is also not particularly bad, so you don't need much money to compensate for it.

If you wanted to convince me, you could make a case that destroying trust is really bad, and that in this particular case pressing the button would destroy a lot of trust, but that case hasn't really been made.

[-]lionhearted (Sebastian Marshall)5y140

LW frontpage going down is also not particularly bad [...] If you wanted to convince me, you could make a case that destroying trust is really bad

Umm, respectfully, I think this is extremely arrogant. Dangerously so.

Anyways, I'm being blunt here, but I think respectful and hopefully useful. Think about this. Reasoning follows —

The instructions if you got launch codes (also in the above post) were as such (emphasis added with underline) —

"Every Petrov Day, we practice not destroying the world. One particular way to do this is to practice the virtue of not taking unilateralist action.

It’s difficult to know who can be trusted, but today I have selected a group of LessWrong users who I think I can rely on in this way. You’ve all been given the opportunity to show yourselves capable and trustworthy.

[...]

This Petrov Day, between midnight and midnight PST, if you, {{username}}, enter the launch codes below on LessWrong, the Frontpage will go down for 24 hours.

I hope to see you on the other side of this, with our honor intact."

So, to Ben Pace at least (the developer who put in a tremendous amount of hours and thought into putting this together), ... (read more)

[-]Rohin Shah5y190

Thanks for writing this up. It's pretty clear to me that you aren't modeling me particularly well, and that it would take a very long time to resolve this, which I'm not particularly willing to do right now.

I'll give anyone 10:1 odds this is cited in a mainstream political science journal within 15 years, which are read by people who both set and advise on policy

I'll take that bet. Here's a proposal: I send you $100 today, and in 15 years if you can't show me an article in a reputable mainstream political science journal that mentions this event, then you send me an inflation-adjusted $1000. This is conditional on finding an arbiter I trust (perhaps Ben) who will:

Adjudicate whether it is an "article in a reputable mainstream political science journal that mentions this event"
Compute the inflation-adjusted amount, should that be necessary
Vouch that you are trustworthy and will in fact pay in 15 years if I win the bet.

1Eli Tyre5y

This basically seems right to me.

[-]habryka5y110

Which part of the two statements? That destroying trust is really bad, or that the case hasn't been made?

8Eli Tyre5y

That this particular case would destroy a lot of trust. This seemed to me like a fun game with stakes of social disapproval on one side, and basically no stakes on the other. This doesn't seem like it has much bearing on the trustworthiness of members of the rationality community in situations with real stakes, where there is a stronger temptation to defect, or it would have more of a cost on the community. I guess implicit to what I'm saying is that the front page being down for 24 hours doesn't seem that bad to me. I don't come to Less Wrong most days anyway.

[-]TurnTrout5y200

But this is not a one-time situation. If you're a professional musician, would you agree to mess up at every dress rehearsal, because it isn't the real show?

More indirectly... the whole point of "celebrating and practicing our ability to not push buttons" is that we need to be able to not push buttons, even when it seems like a good idea (or necessary, or urgent that we defect while we can still salvage the the percieved situation). The vast majority of people aren't tempted by pushing a button when pushing it seems like an obviously bad idea. I think we need to take trust building seriously, and practice the art of actually cooperating. Real life doesn't grade you on how well you understand TDT considerations and how many blog posts you've read on it, it grades you on whether you actually can make the cooperation equilibrium happen.

[-]jp5y100

Rohin argues elsewhere for taking a vote (at least in principal). If 50% vote in favor, then he has successfully avoided "falling into the unilateralist's curse" and has gotten $1.6k for AMF. He even has some bonus for "solved the unilateralist's curse in a way that's not just "sit on his hands". Now, it's probably worth subtracting points for "the LW team asked them not to blow up the site and the community decided to anyway." But I'd consider it fair play.

8Rohin Shah5y

Depends on the upside. This comment of mine was meant to address the claim "people shouldn't be too easily persuaded by arguments about people dying" (the second claim in Raemon's comment above). I agree that intuitions like this should push up the size of the donation you require. As jp mentioned, I think the ideal thing to do is: first, each person figures out whether they personally think the plan is positive / negative, and then go with the majority opinion. I'm talking about the first step here. The second step is the part where you deal with the unilateralist curse. It seems to me like the algorithm people are following is: if an action would be unilateralist, and there could be disagreement about its benefit, don't take the action. This will systematically bias the group towards inaction. While this is fine for low-stakes situations, in higher-stakes situations where the group can invest effort, you should actually figure out whether it is good to take the action (via the two-step method above). We need to be able to take irreversible actions; the skill we should be practicing is not "don't take unilateralist actions", it's "take unilateralist actions only if they have an expected positive effect after taking the unilateralist curse into account". We never have certainty, not for anything in this world. We must act anyway, and deciding not to act is also a choice. (Source)

[-]TurnTrout5y100

It seems to me like the algorithm people are following is: if an action would be unilateralist, and there could be disagreement about its benefit, don't take the action. This will systematically bias the group towards inaction. While this is fine for low-stakes situations, in higher-stakes situations where the group can invest effort, you should actually figure out whether it is good to take the action (via the two-step method above). We need to be able to take irreversible actions; the skill we should be practicing is not "don't take unilateralist actions", it's "take unilateralist actions only if they have an expected positive effect after taking the unilateralist curse into account".

I don’t disagree with this, and am glad to see reminders to actually evaluate different courses of action besides the one expected of us. my comment was more debating your own valuation as being too low, it not being a one-off event once you consider scenarios either logically or causally downstream of this one, and just a general sense that you view the consequences of this event as quite isolated.

6Rohin Shah5y

That makes sense. I don't think I'm treating it as a one-off event; it's more that it doesn't really seem like there's much damage to the norm. If a majority of people thought it was better to take the counterfactual donation, it seems like the lesson is "wow, we in fact can coordinate to make good decisions", as opposed to "whoops, it turns out rationalists can't even coordinate on not nuking their own site".

[-]Richard Yannow5y160

jkaufman's initial offer was unclear. I read it (incorrectly) as "I will push the button (/release the codes) unless someone gives AMF $1672 counterfactually", not as "if someone is willing to pay me $1672, I will give them the codes". Read in the first way, Raemon's concerns about "pressure" as opposed to additional donations made on the fly may be clearer; it's not about jkaufman's opportunity to get $1672 in donations for no work, it's about everyone else being extorted for an extra $1672 to preserve their values.

[-]mingyuan5y100

Perhaps a nitpick, but I feel like the building of trust is being treated less as a sacred value, and more as a quantity of unknown magnitude, with some probability that that magnitude could be really high (at least >$1672, possibly orders of magnitude higher). Doing a Fermi is a trivial inconvenience that I for one cannot handle right now; since it is a weekday, maybe others feel much the same.

3Rohin Shah5y

I agree that your comment takes this (very reasonable) perspective. It didn't seem to me like any other comment was taking this perspective, but perhaps that was their underlying model.

-2lionhearted (Sebastian Marshall)5y

I wouldn't do it for $100M. Seriously. Because it increases the marginal chance that humanity goes extinct ever-so-slightly. If you have launch codes, wait until tomorrow to read the last part eh? — (V zrna, hayrff lbh guvax gur rkcrevzrag snvyvat frpergyl cebzbgrf pnhgvba naq qrfgeblf bcgvzvfz, juvpu zvtug or gehr.)

[-]Rohin Shah5y240

Why couldn't you use the $100M to fund x-risk prevention efforts?

-2lionhearted (Sebastian Marshall)5y

Well, why stop there? World GDP is $80.6 trillion. Why doesn't the United States threaten to nuke everyone if they don't give a very reasonable 20% of their GDP per year to fund X-Risk — or whatever your favorite worthwhile projects are? Screw it, why don't we set the bar at 1%? Imagine you're advising the U.S. President (it's Donald Trump right now, incidentally). Who should President Trump threaten with nuking if they don't pay up to fund X-Risk? How much? Now, let's say 193 countries do it, and $X trillion is coming in and doing massive good. Only Switzerland and North Korea defect. What do you do? Or rather, what do you advise Donald Trump to do?

8Rohin Shah5y

I never suggested threats, and in fact I don't think you should threaten to press the button unless someone makes a counterfactual donation of $1,672. Jeff's original comment was also not supposed to be a threat, though it was ambiguous. All of my comments are talking about the non-threat version.

-3lionhearted (Sebastian Marshall)5y

Dank EA Memes? What? Really? How do I get in on this? (Serious.) (I shouldn't joke "I have launch codes" — that's grossly irresponsible for a cheap laugh — but umm, I just meta made the joke.)

5lionhearted (Sebastian Marshall)5y

Note to self: Does lighthearted dark humor highlighting risk increase or decrease chances of bad things happening? Initial speculation: it might have an inverted response curve. One or two people making the joke might increase gravity, everyone joking about it might change norms and salience.

[-]Adele Lopez5y140

I noticed after playing a bunch of games of a mafia-type game with some rationalists that when people made edgy jokes about being in the mob or whatever, they were more likely to end up actually being in the mob.

6lionhearted (Sebastian Marshall)5y

There's rationalists who are in the mafia? Whoa. No insightful comment, just, like — this Petrov thread is the gift that keeps on giving.

[-]interstice5y140

Can't tell if joking, but they probably mean that they were "actually in the mafia" in the game, so not in the real-world mafia.

4Adele Lopez5y

Yes, lol :)

2Tetraspace5y

Dank EA Memes is a Facebook group. It's pretty good.

[-]mingyuan5y360

(I have launch codes and am happy to prove it to you if you want.)

Hmmm, I feel like the argument "There's some harm in releasing the codes entrusted to me, but not so much that it's better for someone to die" might prove too much? Like, death is really bad, I definitely grant that. But despite the dollar amount you gave, I feel like we're sort of running up against a sacred value thing. I mean, you could just as easily say, "There's some harm in releasing the codes entrusted to me, but not so much that it's better for someone to have a 10% chance of dying" - which would naïvely bring your price down to $167.20.

If you accept as true that that argument should be equally 'morally convincing', then you end up in a position where the only reasonable thing to do is to calculate exactly how much harm you actually expect to be done by you pressing the button. I'm not going to do this because I'm at work and it seems complicated (what is the disvalue of harm to the social fabric of an online community that's trying to save the world, and operates largely on trust? perhaps it's actually a harmless game, but perhaps it's not, hard to know - seems like the majority of effects would happen down the line).

Additionally, I could just counter-offer a $1,672 counterfactual donation to GiveWell for you to not press the button. I'm not committing to do this, but I might do so if it came down to it.

3jefftk5y

Are you telling me you don't think this is a good trade?

[-]mingyuan5y180

Wasn't totally sure when I wrote it, but now firmly yes.

[-]lionhearted (Sebastian Marshall)5y140

This whole thread is awesome. This is the maybe the best thing that's happened on LessWrong since Eliezer more-or-less went on hiatus.

Huge respect to everyone. This is really great. Hard but great. Actually it's great because it's hard.

[-]TurnTrout5y320

I'm leaning towards this not being a good trade, even though it's taxing to type that.

In the future, some people will find themselves in situations not too unlike this, where there are compelling utilitarian reasons for pressing the button.

Look, the system should be corrigible. It really, really should; the safety team's internal prediction market had some pretty lopsided results. There are untrustworthy actors with capabilities similar to or exceeding ours. If we press the button, it probably goes better than if they press it. And they can press it. Twenty people died since I started talking, more will die if we don't start pushing the world in a better direction, and do you feel the crushing astronomical weight of the entire future's eyes upon us? Even a small probability increase in a good outcome makes pressing the button worth it.

And I think your policy should still be to not press the button to launch a singleton from this epistemic state, because we have to be able to cooperate! You don't press buttons at will, under pressure, when the entire future hangs in the balance! If we can't even cooperate, right here, right now, under much weaker pressures, what do we expect of the "untrustworthy actors"?

So how about people instead donate to charity in celebration of not pressing the button?

ETA I have launch codes btw.

7jefftk5y

Oh: and to give those potential other people time to object, I won't accept an offer before 2hr from when I posted the parent comment (4:30 Boston time)

7Rohin Shah5y

The normal way to resolve unilateralist curse effects is to see how many people agree / disagree, and go with the majority. (Even if the action is irreversible, as long as everyone knows that and has taken that into account, going with the majority seems fine.) Pro: it saves an expected life. Con: LW frontpage probably goes down for a day. Con: It causes some harm to trust. Pro: It reinforces the norm of actually considering consequences, and not holding any value too sacred. Overall I lean towards the benefits outweighing the costs, so I support this offer. ETA: I also have codes.

[-]John_Maxwell5y220

Pro: It reinforces the norm of actually considering consequences, and not holding any value too sacred.

Not an expert here, but my impression was sometimes it can be useful to have "sacred values" in certain decision-theoretic contexts (like "I will one-box in Newcomb's Problem even if consequentialist reasoning says otherwise"?) If I had to choose a sacred value to adopt, cooperating in epistemic prisoners' dilemmas actually seems like a relatively good choice?

8Rohin Shah5y

I don't think of Newcomb's problem as being a disagreement about consequentialism; it's about causality. I'd mostly agree with the statement "I will one-box in Newcomb's Problem even if causal reasoning says otherwise" (though really I would want to add more nuance). I feel relatively confident that most decision theorists at MIRI would agree with me on this. In a real prisoner's dilemma, you get defected against if you do that. You also need to take into account how the other player reasons. (I don't know what you mean by epistemic prisoner's dilemmas, perhaps that distinction is important.) I also want to note that "take the majority vote of the relevant stakeholders" seems to be very much in line with "cooperating in epistemic prisoner's dilemmas", so if the offer did go through, I would expect this to strengthen that particular norm. See also this comment. I would not put it this way. It depends on what future situations you expect to be in. You might want to keep honesty as a sacred value, and tell an ax-murderer where your friend is, if you think that one day you will have to convince aliens that we do not intend them harm in order to avert a huge war. Most of us don't expect that, so we don't keep honesty as a sacred value. Ultimately it does all boil down to consequences.

6jefftk5y

If we could figure out some reasonable way to poll people I agree, but I don't see a good way to do that, especially not on this timescale?

9DanielFilan5y

Presumably you could take the majority vote of comments left in a 2 hour span?

5Rohin Shah5y

^ Yeah, that. The policy of "if two people object then the plan doesn't go through" sets up a unilateralist-curse scenario for the people against the plan -- after the first person says no, every future person is now able to unilaterally stop the plan, regardless of how many people are in favor of it. (See also Scott's comment.) Ideally we'd avoid that; majority vote of comments does so (and seems like the principled solution). (Though at this point it's probably moot given the existing number of nays.)

2lionhearted (Sebastian Marshall)5y

Let's, for the hell of it, assume real money got involved. Like, it was $50M or something. Now — who would you want to be able to vote on whether destruction happens if their values aren't met with that amount of money at stake? If it's the whole internet, most people will treat it as entertainment or competition as opposed to considering what we actually care about. But if we're going to limit it only to people that are thoughtful, that invalidates the point of majority vote doesn't it? Think about it, I'm not going to write out all the implications, but I think your faith in crowdsourced voting mechanisms for things with known-short-payoff against with long-unknown-costs that destroy long-unknown-gains is perhaps misplaced...? Most people are — factually speaking — not educated on all relevant topics, not fully numerate on statistics and payoff calculations, go with their feelings instead of analysis, and are short-term thinkers..........

3Rohin Shah5y

I agree that in general this is a problem, but I think in this particular case we have the obvious choice of the set of all people with launch codes. (Btw, your counterargument also applies to the unilateralist curse itself.)

6DanielFilan5y

I'm surprised that LW being down for a day isn't on your list of cons. [ETA: or rather the LW home page]

[-]Peter Wildeford5y130

It could also be on the list of pros, depending on how one uses LW.

[-]Raemon5y170

I feel obligated to note that it will in fact only destroy the frontpage of LW, not the rest of the site.

2jacobjacob5y

Ah. I thought it was the entire site. (Though it did say "Frontpage" in the post.)

2Rohin Shah5y

Good point, added, doesn't change the conclusion.

1tcheasdfjkl5y

I'll note that giving someone the launch codes merely increases the chance of the homepage going down.

2johnswentworth5y

I dunno, one life seems like a pretty expensive trade for the homepage staying up for a day. I bet a potential buyer could shop around and obtain launch codes for half a life. Not saying I'd personally give up my launch code at the very reasonable cost of $836. But someone could probably be found. Especially if the buyer somehow found a way to frame someone else for the launch. (Of course, now this comment is sitting around in plain view of everyone, the launch codes would have to come from someone other than me, even accounting for the framing.)

2tcheasdfjkl5y

this makes sense. I shall consider whether it makes sense for me to impulse-spend this amount of money on shenanigans (and lifesaving)

[-]jefftk5y100

If you're considering it as spending on lifesaving then it doesn't sound counterfactual?

2tcheasdfjkl5y

I'm pretty sure it is? I had already decided on & committed to a donation amount for 2019, and this would be in addition to that. The lifesaving part is relevant insofar as I am happier about the prospect of this trade than I would be about paying the same amount to an individual. The only way in which I could imagine this not being perfectly counterfactual is that given that discretionary spending choices depend some on my finances at any given point, and given that large purchases have some impact on my finances, it may be that if some other similar opportunity presented itself later on, my decision re: that opportunity could have some indirect causal connection to my current decision (not in the direct sense of "oh I already donated last month so I won't now" but just in the sense of "hmm how much discretionary-spending money do I currently have and, given that, do I want to spend $X on Y"). I'm not sure it's really ever possible to get rid of that though?

1jp5y

It could partially motivated by lifesaving but they wouldn't have donated otherwise. Like, not if they're a perfectly rational agent, but hey.

3tcheasdfjkl5y

If someone else with codes wants to make this offer now that Jeff has withdrawn his, I'm now confident I am up for this.

3mingyuan5y

I preemptively counter-offer whatever amount of money tcheasdfjkl would pay in order for this hypothetical person not to press the button.

8tcheasdfjkl5y

To be clear I am NOT looking for people to press the button, I am looking for people to give me launch codes.

8mingyuan5y

Oh wow, I did not realize how ambiguous the original wording was.

2Said Achmiz5y

Forgive me if I’m being dense, but just what in the world is a “counterfactual donation”?

[-]habryka5y180

Jeff does conveniently have a blogpost on this: https://www.jefftk.com/p/what-should-counterfactual-donation-mean

[-]gjm5y110

It seems extremely unfortunate that the terminology apparently shifted from "counterfactually valid" (which means the right thing) to "counterfactual" (which means almost the opposite of the right thing).

9Raemon5y

Do you have a suggestion for terminology that properly truncates? (i.e. I think it's basically impossible to expect a long phrase to end up being the one people regularly use, so if you want to fix that issue you need a single word that does the job)

[-]Wei Dai5y160

"Additional donation" seems like the obvious choice in place of "counterfactual donation", since we just mean "additional to what you would have donated anyway", right? (The very obviousness makes me think maybe there's a downside to the term that I'm not seeing, or I'm confused in some other way.)

3tcheasdfjkl5y

Sounds pragmatically weird in the case where the person isn't known to already be donating.

[-]Tetraspace5y140

Clicking on the button permanently switches it to a state where it's pushed-down, below which is a prompt to enter launch codes. When moused over, the pushed-down button has the tooltip "You have pressed the button. You cannot un-press it." Screenshot.

(On an unrelated note, on r/thebutton I have a purple flair that says "60s".)

Upon entering a string of longer than 8 characters, a button saying "launch" appears below the big red button. Screenshot.

II.

I'm nowhere near the PST timezone, so I wouldn't be able to reliably pull a shenanigan whereby if I had the launch codes I would enter or not enter them depending on the amount of counterfactual money pledged to the Ploughshares Fund in the name of either launch-code-entry-state, but this sentence is not apophasis.

III.

Conspiracy theory: There are no launch codes. People who claim to have launch codes are lying. The real test is whether people will press the button at all. I have failed that test. I came up with this conspiracy theory ~250 milliseconds after pressing the button.

IV. (Update)

I can no longer see the button when I am logged in. Could this mean that I have won?

[-]Scott Garrabrant5y140

Conspiracy theory: There are no launch codes. People who claim to have launch codes are lying. The real test is whether people will press the button at all. I have failed that test. I came up with this conspiracy theory ~250 milliseconds after pressing the button.

Oh no! Someone is wrong on the internet, and I have the ability to prove them wrong...

[-]TurnTrout5y500

4Raemon5y

[-]jp5y240

[-]Tetraspace5y230

To make sure I have this right and my LW isn't glitching: TurnTrout's comment is a Drake meme, and the two other replies in this chain are actually blank?

[This comment is no longer endorsed by its author]Reply

7Ruby5y

7jimrandomh5y

? !

[-]jimrandomh5y370

(This thread is our collective reenactment of the conversations about nuclear safety that happened during the cold war.)

[-]Tetraspace5y100

Well, at least we have a response to the doubters' "why would anyone even press the button in this situation?"

1lionhearted (Sebastian Marshall)5y

https://en.wikipedia.org/wiki/Pandora#Works_and_Days What's in the box? What's in the box? Don't open it! Oh, shit... (Grace, longing and care, and being gifted causes the box to be opened. It's like history just keeps repeating itself or something...)

[-]Ben Pace5y120

"And on that day, the curse was lifted."

[-]Error5y120

How did you implement the button? I run a small site, love the idea, and would like to do something similar.

[-]tcheasdfjkl5y90

Can we have a recap from the mods of how Petrov Day went? How many people pressed the button, how many people tried entering anything in the launch code field, how many people tried the fake launch code posted on Facebook in particular?

[-]Ben Pace5y220

Currently writing that post :)

Added: Will post it sometime today, but probably later on.

[-]lionhearted (Sebastian Marshall)5y100

You guys are total heroes. Full stop. In the 1841 "On Heroes" sense of the word, which is actually pretty well-defined. (Good book, btw.)

[-]ryan_b5y80

Generic feedback:

I had launch codes. I had hidden the map previously in my settings, which also had the effect of hiding the button, which in turn was enough to screen off any buttons should be pressed and would this really work? temptations.

I did keep checking the site to see if it went down, though.

[-]Quirinus_Quirrell5y70

I have the launch codes. I'll take the site down unless Eliezer Yudkowsky publicly commits to writing a sequel chapter to HPMoR, in which I get an acceptably pleasant ending, by 9pm PST.

[-]Ben Pace5y390

The enemy is smart.

"The enemy knew perfectly well that you'd check whose launch codes were entered, especially since the nukes being set off at all tells us that someone can appear falsely trustworthy." Ben shut his eyes, thinking harder, trying to put himself into the enemy's shoes. Why would he, or his dark side, have done something like - "We're meant to conclude that the enemy has the launch codes. But that's actually something the enemy can only do with difficulty, or under special conditions; they're trying to create a false appearance of omnipotence." Like I would. "Later, hypothetically, the nukes actually get fired. We think it was Quirinus_Quirrell firing it, but really, it was just someone firing it independently."

"Unless that is precisely what Quirinus_Quirrell expects us to think," said Jim Babcock, his brow furrowed in concentration. "In which case he does have the launch codes, as well as the other person."

"Does Quirinus_Quirrell really use plots with that many levels of meta -"

"Yes," said Habryka and Jim.

Ben nodded distantly. "Then this could be a setup to either make

... (read more)

4DanielFilan5y

(FYI California is currently in the PDT time zone, not PST)

0[anonymous]5y

[-]habryka5y190

The site will go down for a full 24 hours after the button was pressed and correct launch codes entered (not that that is the most important aspect of this situation, but I figured I would clarify anyways)

[-]johnswentworth5y60

That is a very shiny button.

[-]bgold5y130

so shiny. It's like, it's begging to be pressed.

http://www.scp-wiki.net/scp-001-j

[-]gjm5y50

I don't see the big shiny red button on the front page. If I visit LW in private mode, it's there. I have the map turned off. I haven't tried logging out or turning the map back on. I'm guessing that when Ben says it's "over the frontpage map" that means it's implemented in a way that makes it disappear if the map isn't there. That seems a bit odd, though it probably isn't worth the effort of fixing.

(I have a launch code but hereby declare my intention not to use it. I am intrigued by the discussions of tra... (read more)

2habryka5y

Yep, you have to activate the map to see it. Just turned out to be the most convenient way of implementing it, and also worked well aesthetically.

[-]lionhearted (Sebastian Marshall)5y50

Rot13 comment, if you have launch codes, recommend you wait until tomorrow to read this eh?

(1) V'z phevbhf ubj znal crbcyr jvgu ynhapu pbqrf pyvpxrq gur ohggba "gb purpx vg bhg" jvgubhg ragrevat ynhapu pbqrf. V qvqa'g qb fb, npghnyyl, fb V pna bayl cerfhzr lbh'q unir gb ragre pbqrf.

(2) V jbaqre vs gur yvfg bs anzrf jnf znqr choyvp vs crbcyr jbhyq or zber yvxryl be yrff yvxryl gb cerff vg. Anvir nafjre vf yrff yvxryl, ohg vg zvtug unir n fgenatr "lbh pna'g pbageby zr ivn funzr" serrqbz rssrpg sbyybjrq ol xnobbz.

(3) Qr... (read more)

[-]Ramiro P.5y40

So far, LW is still online. It means:

a) either nobody used their launch codes, and you can trust 125 nice & smart individuals not to take unilateralist action - so we can avoid armageddon if we just have coordinated communities with the right people;

b) nobody used their launch codes, because these 125 are very like-minded people (selection bias), there's no immediate incentive to blow it up (except for some offers about counterfactual donations), but some incentive to avoid it (honor!... hope? Prove EDT, UDT...?). It doesn't model the proble... (read more)

[-]Slider5y20

I hovered over the button thinking that the button appearing means I am one of the chosen ones. Afterwards it seemed I was reckless. I was curious and thought that I can just choose not press my mouse button (I did manage that). One the other hand I was hazy on the mechanics on how things work and I knew moving the mouse over the button means lower distance between bad things and present. The tooltip popup was unexpected and somewhat startled me. It could have been possible to have a mechanism go off with that and I was not considering that. Full smuchbait

... (read more)

[-][anonymous]5y10

[This comment is no longer endorsed by its author]Reply

4habryka5y

Everyone has access to the button, but only 125 users were sent launch codes (after you press the button the launch-codes panel appears).

3jmh5y

Certainly good to hear. I almost accidentally pressed it earlier! No codes so good fail-safe for me.

Moderation Log