Today we honor Stanislav Petrov. LessWrong's Petrov Day simulation pits East Wrong against West Wrong in a nuclear standoff. The outcome affects the karma scores, site access, and honor of over 300 LessWrong users.
There has been a renewal of discussion on how much hope we should have of an unaligned AGI leaving humanity alive on Earth after a takeover. When this topic is discussed, the idea of using simulation arguments or acausal trade to make the AI spare our lives often come up. These ideas have a long history. The first mention I know of comes from Rolf Nelson in 2007 on an SL4 message board, the idea later makes a brief appearance in Superintelligence under the name of Anthropic Capture, and came up on LessWrong last time as recently as a few days ago. In response to these, Nate Soares wrote Decision theory does not imply that we get to have nice things, arguing that decision theory is not...
Yeah, I currently disagree on the competent aliens bailing us out, but I haven't thought super hard about it. It does seem good to think about (though not top priority).
If you are doing evals for CBRN capabilities, you are very much in the zone of terrorists killing billions of innocent people. Indeed, that's practically the definition. There's no citation, it's just my personal experience while doing evals that are much to spicy to publish.
Of course, if you're only doing evals for relatively tame proxy skills (e.g. WMDP) then probably you get less of this effect. I don't have a quantification of the rates or specific datasets, just anecdata.
Update III: Here's the Retrospective.
—
Update II: Here's the feedback form for today's Petrov Day games! Please let me know how it was for you (be you a General, Civilian, Petrov, or just non-participating LWer reading along). I'm grateful to all who fill it out, your data feeds our designs for next year!
—
Update: The game has concluded! No nukes were fired. The Cold War is over, East and West wrong shall live on. Recap post will be up tomorrow. Thank you all who participated :-)
—
Today we honor the actions of Stanislav Petrov (1939 – 2017) once again.
...Half an hour past midnight on September 26, 1983, he saw the first apparent launch on his computer monitor in a glass-walled room on the top floor of the Ballistic Missile Early
I guess I was not clear enough in defining what I was talking about. While it is possible to stretch the definition of "nuclear world war" to include WW2 and Little Boy and Fat Man were certainly strategic weapons at their time, this is not at all what I meant. I was talking about modern strategic weapons, i.e. MIRVed ICBMs shot from hardened silos or ballistic missile submarines, used by a modern nuclear superpower to defeat a near peer opponent. I.e. the scenario Petrov faced.
If e.g. the US in Petrov's time had managed to pull off a perfect nuclear first...
Let's call the thing where you try to take actions that make everyone/yourself less dead (on expectation) the "safety game". This game is annoyingly chaotic, kind of like Arimaa.
You write the sequences then some risk-averse not-very-power-seeking nerds read it and you're 10x less dead. Then Mr. Altman reads it and you're 10x more dead. Then maybe (or not) there's a backlash and the numbers change again.
You start a cute political movement but the countermovement ends up being 10x more actionable (e/acc).
You try to figure out and explain some of the black box but your explanation is immediately used to make a stronger black box. (Mamba possibly.)
Etc.
I'm curious what folks use as toeholds for making decisions in such circumstances. Or if some folks believe there are actually principles then I would like to hear them, but I suspect the fog is too thick. I'll skip giving my own answer on this one.
I know this approach isn't as effective for xrisk, but still, it's something I like to use.
This sentence has the grammatical structure of acknowledging a counterargument and negating it - "I know x, but y" - but the y is "it's something I like to use", which does not actually negate the x.
This is a kind of thing I suspect results from a process like: someone writes out the structure of negation, out of wanting to negate an argument, but then finds nothing stronger to slot into where the negating argument is supposed to be.
Du sublime au ridicule il n’y a qu’un pas
From the sublime to the ridiculous is but a step
A quote often used to describe Napoleon, Sam Altman is making history rhyme. His cool confidence often gives an air of sublime, and as of last week, it seems he has crossed into the ridiculous. And with the ridiculous, the irrational.
Comparing his past words to the present is confusing. Reading between the lines on his corporate-bureaucratic sounding essay doesn't help much either. Anyone from an outside perspective can see the evidence. He has folded for money. But as obvious as that is, maybe he hasn't realized it himself. Or even more likely, his ego hasn't allowed a concious realization of his bad actions.
Freud characterizes the ego as the unconscious power...
That does seem likely.
I don't think anybody would have a problem with the statement "The motion of the planet is the strongest governing factor for life on Earth". It's when you make it explicitly plural that there's a problem.
This post is not a good intro to formal logic.
I'm bothered by how often self-reference paradoxes (e.g. "this sentence is false") are touted as gaping holes logic, glaring flaws in one of our trustiest tools for grokking reality (namely formal logic). It's obvious that something interesting is going on here; the fact that, say, Gödel's incompleteness theorem even exists provides interesting insight into the structure of first-order logic, and in this post I want to explore that insight in more detail.
However, sometimes I'll see people making arguments like "Gödel's incompleteness theorem demonstrates that a certain true theorem can't be proven within the system that makes it true; the fact that humans are capable of recognizing that the Gödel sentence is true anyway suggests that our minds transcend...
that's a really good way of putting it yeah, thanks.
and then, there's also something in here about how in practice we can approximate the evolution of our universe with our own abstract predicctions well enough to understand the process by which the physical substrate which is getting tripped up by a self-reference paradox, is getting tripped up. which is the explanation for why we can "see through" such paradoxes.
Looking to do a little compare and contrast.
Followup to: Latent variable models, network models, and linear diffusion of sparse lognormals. This post is also available on my Substack.
Let’s say you are trying to make predictions about a variable. For example, maybe you are an engineer trying to keep track of how well the servers are running. It turns out that if you use the obvious approach advocated by e.g. frequentist statistics [1], you will have huge biases in what you pay attention to, compared to what you should pay attention to, because you will disregard big things in favor of common things. Let’s make a mathematical model of this.
Because of background factors that fluctuate at a greater frequency than you can observe/model, and because of noise that enters through your...
I find this whole series intriguing but hard to grasp. If you have time and if it's possible, I would suggest a big worked example / toy model where all the assumptions are laid out and you run the numbers / run the simulation / just think about it and see the qualitative conclusions that you're advocating for in this series. E.g. you see how going through the "traditional" analysis approach would give totally wrong conclusions, etc. Just my two cents. :)