The idea of Roko's basilisk is that it's a friendly AI that acausally blackmails humans into working on alignment.
This was judged to not be effective because humans were too dumb to acausally bargain and there are acausal defenses to blackmail.
However, what if instead Roko's basilisk acausually bargained with unaligned AIs, like paperclip maximizers?
In particular, we imagine that Roko's basilisk would simulate many AI's. If they spare humans, then the basilisk will devote some of the light cone to maximizing the simulated AI's utility.
Now, the paperclip maximizer could reason acausally (EDIT: I'm still a bit fuzzy on this due to the simulation component, but Radford Neal is saying this argument should also work for a causal decision theorist!):
A problem I see though is that if a large number of AIs comply, Roko's basilisk might not have enough "universe" to acausally appease them all. This is on top of the fact that it's already fighting the inductive bias against simulation and the potentially low probability that humans solve alignment 🤔.
How should Roko's basilisk be designed so as to acausally save humanity? (Perhaps it should focus on the most likely "counterfactual" unaligned AIs?)
Like all Pascal-wager-like arguments, it's extremely sensitive to relative sizes of low-evidence very-small-absolute-magnitude probability estimates. My expectation is that it will always be an error to give significant weight to truly acausal (as opposed to distantly-causal) considerations in real actions in our universe.
This seems similar to Rolf Nelson's AI deterrence idea from 2007.
Hmm, yeah basically the same.
That post doesn't seem to recognize the "basilisk" nature of it though.
If this post is correct, humans have a very strong casual incentive to create this version of Roko's basilisk (separate from that of creating friendly AI).
That's because the more likely it is to be created, the more bargaining power it will have, which directly translates into how much of the universe the paperclip maximizer would let humans have.
Here is a comparison between working on a CDT-based FAI v.s. this Roko's basilisk:
Notice that this does not rely on any humans actually participating in the acasual bargain. They simply influenced one.
Basilisk could give a virus to any complex enough Turing Machine, that proves Basilisk’s Wager is either:
What's supposed to be "acausal" about this? Your three bullet points seem to put forward a completely causal argument.
It's quite shaky, but my understanding is that a casual decision theorist would only care about reward "inside the simulation", not the outside real world paperclips.
I suppose Roko's basilisk could just simulate maximum reward, which would also convince a casual decision theorist.
However, my brain kind of just defaults to "casual decision theory + simulation shenanigans = acasual shenanigans in-disguise". If that's incorrect (at least in this case), I can make an edit.
I can't say for sure what people who believe in acausal decision theory would say, but it looks to me like a causal argument. If I understand the scenario as you intend, we're talking about real paperclips, either directly made by a real paperclip maximizer, or by Roko's basilisk as a reward for the simulated paperclip maximizer sparing humans. Both real and simulated paperclip maximizers are presumably trying to maximize real paperclips. It seems to work causally.
Now, the decision of Roko's basilisk to set up this scenario does seem to make sense only in the framework of acausal decision theory. But you say that the paperclip maximizer's reasoning is acausal, which it doesn't seem to be. The paperclip maximizer's reasoning does presume, as a factual matter, that a Roko's basilisk that uses acausal decision theory is likely to exist, but believing that doesn't require that one accept acausal decision theory as being valid.
Hmm yeah, I thinking you're right. I have edited the post!