The idea of Roko's basilisk is that it's a friendly AI that acasually blackmails humans into working on alignment.

This was judged to not be effective because humans were too dumb to acasually bargain and there are acasual defenses to blackmail.

However, what if instead Roko's basilisk acasually bargained with unaligned AIs, like paperclip maximizers?

In particular, we imagine that Roko's basilisk would simulate many AI's. If they spare humans, then the basilisk will devote some of the light cone to maximizing the simulated AI's utility.

Now, the paperclip maximizer could reason acasually (EDIT: I'm still a bit fuzzy on this due to the simulation component, but Radford Neal is saying this argument should also work for a casual decision theorist!):

  • I might be in a simulation of a more powerful AI.
  • Roko's basilisk is a slightly more likely candidate, since humans have at least a small bias towards building friendly AI v.s. any other specific utility maximizer.
  • Therefore, I ought to spare the humans and their solar system while turning the rest of the universe into paperclips. The gain from Roko's basilisk liking me is much greater than a single solar system's worth of paperclips.

A problem I see though is that if a large number of AIs comply, Roko's basilisk might not have enough "universe" to acasually appease them all. This is on top of the fact that it's already fighting the inductive bias against simulation and the potentially low probability that humans solve alignment 🤔.

How should Roko's basilisk be designed so as to acasually save humanity? (Perhaps it should focus on the most likely "counterfactual" unaligned AIs?)

New Comment
8 comments, sorted by Click to highlight new comments since: Today at 10:54 AM

This seems similar to Rolf Nelson's AI deterrence idea from 2007.

Hmm, yeah basically the same.

That post doesn't seem to recognize the "basilisk" nature of it though.

If this post is correct, humans have a very strong casual incentive to create this version of Roko's basilisk (separate from that of creating friendly AI).

That's because the more likely it is to be created, the more bargaining power it will have, which directly translates into how much of the universe the paperclip maximizer would let humans have.

Here is a comparison between working on a CDT-based FAI v.s. this Roko's basilisk:

  • If they get created, the CDT-based work is slightly better because it gives us 100% of the universe, instead of bargaining parts of it away.
  • If the paperclip maximizer gets created, work on the CDT-based one gives no benefit. Work on the Roko's basilisk does translate into a direct benefit.

Notice that this does not rely on any humans actually participating in the acasual bargain. They simply influenced one.

Like all Pascal-wager-like arguments, it's extremely sensitive to relative sizes of low-evidence very-small-absolute-magnitude probability estimates.  My expectation is that it will always be an error to give significant weight to truly acausal (as opposed to distantly-causal) considerations in real actions in our universe.


Basilisk could give a virus to any complex enough Turing Machine, that proves Basilisk’s Wager is either:

  • a clear mutualistic win-win with the Basilisks (Hive)
  • or a “you will need to waste all your resources trying to avoid our traps”

What's supposed to be "acausal" about this?  Your three bullet points seem to put forward a completely causal argument.

It's quite shaky, but my understanding is that a casual decision theorist would only care about reward "inside the simulation", not the outside real world paperclips.

I suppose Roko's basilisk could just simulate maximum reward, which would also convince a casual decision theorist.

However, my brain kind of just defaults to "casual decision theory + simulation shenanigans = acasual shenanigans in-disguise". If that's incorrect (at least in this case), I can make an edit.

I can't say for sure what people who believe in acausal decision theory would say, but it looks to me like a causal argument.  If I understand the scenario as you intend, we're talking about real paperclips, either directly made by a real paperclip maximizer, or by Roko's basilisk as a reward for the simulated paperclip maximizer sparing humans.  Both real and simulated paperclip maximizers are presumably trying to maximize real paperclips.  It seems to work causally.

Now, the decision of Roko's basilisk to set up this scenario does seem to make sense only in the framework of acausal decision theory.  But you say that the paperclip maximizer's reasoning is acausal, which it doesn't seem to be.  The paperclip maximizer's reasoning does presume, as a factual matter, that a Roko's basilisk that uses acausal decision theory is likely to exist, but believing that doesn't require that one accept acausal decision theory as being valid.

Hmm yeah, I thinking you're right. I have edited the post!

New to LessWrong?