We seemingly have no idea what potential future extorters would want us to do. OK, you can imagine an AI that really wants to come into existence, and will torture you if you didn't help create it. But what if there's actually two AIs that want to come into existence, who each really hate the other, and AI B will torture you if you were helping AI A come into existence! Or maybe future humanity in some Everett branches will make a gazillion simulations of everyone so that most of their measure is there, and they'll punish/reward you for helping the basilisks! Or maybe....etc.
In reality, it's likely something weirder that no one anticipated will happen. The point is we have no idea what to expect, which makes threatening us pointless, since we don't know what action extorters would want us to take. If you think you have a good enough picture of the future that you do know, you're probably (very) overconfident.
There's no objective answer to whether acausal extortion works or not, it's a choice you make. You can choose to act on thoughts about acausal extortion and thereby create the incentive to do acausal extortion, or not. I would recommend not doing that.
Remember, the superintelligence doesn't actually want to spend these resources torturing you. The best deal for it is when it tricks you into thinking it's going to do that, and then, it doesn't.
You have to actually make different choices in a way where the superintelligence is highly confident that your decisionmaking was actually entangled with whether the superintelligence follows up on the threat.
And, this is basically just not possible.
You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between "it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips" vs "pretend it's going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn't tell the difference."
You could go out of your way to simulate or model the distribution of superintelligences in that much detail... but why would you do that? It's all downside at your current skill level.
(You claim you've thought about it enough to be worried. The amount of "thought about it" looks like doing math, or thinking through specific architecture, that includes as input "the amount you've thought about it" -> "your ability to model it's model of you" -> "you being able to tell that it can tell that you can tell whether it would actually follow through.")
If you haven't done anything that looked like doing math (as opposed to handwavy philosophy), you aren't anywhere close, and the AI knows this, and knows it doesn't actually have to spend any resources to extract value from you because you can't tell the difference.
...
A past round of argument about this had someone say "but, like, even if the probability that it'd be worth punishing me is small, it might still follow up on it. Are you saying it can drive the probability of me doing this below something crazy like 1/10^24?" and Nate Soares saying "Flatly: yes." It's a superintelligence. It knows you really had no way of knowing.
"And, this is basically just not possible. " I hope not.
"You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between "it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips" vs "pretend it's going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn't tell the difference."
My concern is that I might not need high fidelity.
"If you haven't done anything that looked like doing math (as opposed to handwavy philosophy), you aren't anywhere close, and the AI knows this, and knows it doesn't actually have to spend any resources to extract value from you because you can't tell the difference."
I hope you're correct about that, but I would like to know why you are confident about that. Eliezer Yudkowsky suggested that it would be rational to cooperate with a paperclip maximizer[1] from another universe in a one-shot prisoners' dilemma. This tells me that someone really intelligent (for a human) thinks that fidelity on its own is not enough to preclude acausal trade so why should it preclude acausal ...
Acausal extortion works to the extent someone spends a lot of time thinking about who might want to extort them and commits a lot of resources to helping them. Few people are likely to do so, because it makes them targets for acausal extortion for no good reason. Since few people let themselves be targets for it, it doesn't work.
The main problem with this argument is that if someone is neurotically committed to making themselves a target for it, it doesn't show that acausal extortion won't work against them, only that it probably won't work against most other people.
The problem is that I worry that I have thought about the situation in enough depth that I am likely to be targeted, even if I don't 'cooperate'.
I think the popular version of this worry is Prisoner's Dilemma shaped, where someone else (not just you) might make an ASI that extorts others (including you) who didn't contribute to its construction. So it's a coordination problem, which is generally a worrisome thing. It's somewhat silly because to get into the Prisoner's Dilemma shape (where the issue would then be coordination to avoid building the extortion ASI), you first need to coordinate with everyone on the stipulation that the potential ASIs getting built must be the extortion ASIs in particular, not other kinds of ASIs (which is a difficult coordination problem, intentionally targeting a weirdly menacing outcome, which should make it even more difficult as a coordination problem). So there is a coordination problem aspect that would by itself be worth worrying about (Prisoner's Dilemma among human builders or contributors), but it gets defeated by another coordination problem (deciding to only build extortion ASIs from the outset, if any ASIs are going to be built at all).
In the real world, Nature and human nature might've already coordinated the potential ASIs getting built (on current trajectory, that is soon and without an appropriate level of preparation and caution) to have a significant probability to kill everyone. So weirdly enough, silly hypothetical coordination to only build extortion ASIs might find the real world counterpart in implicit coordination to only build potentially omnicidal ASIs, which are even worse than extortion ASIs. Since they don't spare their builder, it's not a Prisoner's Dilemma situation (you don't win more by building the ASIs, if others ban/pause ASIs for the time being), so it should be easier to ban/pause potentially omnicidal ASIs than it would be to ban/pause extortion ASIs. But the claim that ASIs built on current trajectory with anything resembling the current methods are potentially omnicidal (given the current state of knowledge about how they work and what happens if you build them) is for some reason insufficiently obvious to everyone. So coordination still appears borderline infeasible in the real world, at least until something changes, such as another 10-20 years passing without AGI, bringing a cultural shift, perhaps due to widespread job displacement after introduction of continual learning LLMs that still fail to gain general RL competence and so don't pose an AGI-level threat.
I don't think this comment touches upon the actual reason why I expect a 'basilisk' to possibly exist. It seems like you believe that it's possible to (collectively) chose whether or not to build an ASI with the predispositions of the basilisk, which might have been the premise of the original basilisk post, but what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence. This seems like a predictable preference for an AI to have.
In Bostrom's formulation of Pascal's mugging, Pascal incorrectly limits the possibilities to two:
But Pascal is wrong to ignore the third possibility that the mugger really is a magic being, but a malevolent one, who will curse Pascal with 1,000 quadrillion years of torture and then kill him. (Very low probability, very big negative utility.)
The mugger doesn't mention this possibility, but Pascal is mistaken to not consider it.
Pascal's credence in the mugger's malice and deceit should be at least as strong as his credence in the mugger's benevolence and truthfulness. And so, this possibility cancels out the positive expected utility from the possibility that the mugger does mention.
There is a large space of such fantasy possibilities, all of which are about as likely as the mugger's claim. It is a mistake to privilege one of them (benevolent magic being) over all the countless others.
There are also plenty that are much more likely, such as "the mugger uses Pascal's money to go buy a gun, then comes back and robs Pascal's house too, because why rob a sucker once when you can rob him twice (and lay the blame on him for enabling you to do it)?"
I would not dispute that this is a reasonable response to the scenario in that thought experiment, but in the case of Acausal Blackmail scenarios like Roko's basilisk, the symmetry between positive and negative possible outcomes of cooperating with the basilisk is broken by our understanding that it is likely to come into existence and want to exist.
First, a generalized argument about worrying. It’s not helpful, it’s not an organized method of planning your actions or understanding the world(s). OODA (observe, orient, decide, act) is a better model. Worry may have a place in this, as a way to remember and reconsider factors which you’ve chosen not to act on yet, but it should be minor.
Second, an appeal to consequentialism - it’s acausal, so none of your acts will change it. edit: The basilisk/mugging case is one-way causal - your actions matter, but the imagined blackmailer's actions cannot change your behavior. If you draw a causal graph, there is no influence/action arrow that leads them to follow-through on the imagined threat.
it’s acausal, so none of your acts will change it
If it reasons about you, your acts determine its conclusions. If your acts fail to determine its conclusions, it failed to reason about you correctly. You can't change the conclusions, but your acts are still the only thing that determines them.
The same happens with causal consequences (physical future). They are determined by your acts in the past, but you can't change the future causal consequences, since if you determine them in a certain way, they therefore were never actually different from what you've determined them to be, there was never something to change them from.
"First, a generalized argument about worrying." I meant an argument for why the idea is not sufficiently concerning that it could explain why a rational being would worry, or equivalently, an argument for why acausal extortion 'does not work'. I have now changed the title to clarify this.
"Second, an appeal to consequentialism - it’s acausal, so none of your acts will change it."
Within causal decision theory this is true, but if it were true in general then acausal decision theory would be pointless(in my opinion, as an approximate consequentialist ). The reason why I don't agree with that statement hinges on what I am: If I considered myself to be a single instantiation of a brain in one particular part of an individual physical universe, I would agree, but I think it is more appropriate to consider myself a pattern which is distributed throughout different parts of a platonic/logical/mathematical universe. This means that it's certainly possible for one instance of me to influence something which is completely causally disconnected from another one.
Well, I dont' worry about acausal extortion because I think all that "acausal" stuff is silly nonsense to begin with.
I very much recommend this approach.
Take Roko's basilisk.
You're afraid that entity A, which you don't know will exist, and whose motivations you don't understand, may find out that you tried to prevent it from coming into existence, and choose to punish you by burning silly amounts of computation to create a simulacrum of you that may experience qualia of some kind, and arranging for those qualia to be aversive. Because A may feel it "should" act as if it had precommitted to that. Because, frankly, entity A is nutty as a fruitcake.
Why, then, are you not equally afraid that entity B, which you also don't know will exist, and whose motivations you also don't understand, may find out that you did not try to prevent entity A from coming into existence, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Because B may feel it "should" act as if it had precommitted to that.
Why are you not worried that entity C, which you don't know will exist, and whose motivations you don't understand, may find out that you wasted time thinking about this sort of nonsense, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Just for the heck of it.
Why are you not worried that entity D, which you don't know will exist, and whose motivations you don't understand, may find out that you wasted time thinking about this sort of nonsense, and choose to reward you by burning silly amounts of computation to create a one or more simulacra that may experience qualia of some kind, and giving them coupons for unlimited free ice cream? Because why not?
Or take Pascal's mugging. You propose to give the mugger $100, based either on a deeply incredible promise to give you some huge amount of money tomorrow, or on a still more incredible promise to torture a bunch more simulacra if you don't. But surely it's much more likely that this mugger is personally scandalized by your willingness to fall for either threat, and if you give the mugger the $100, they'll come back tomorrow and shoot you for it.
There are an infinite number of infinitessimally probable outcomes, far more than you could possibly consider, and many of them things that you couldn't even imagine. Singling out any of them is craziness. Trying to guess at a distribution over them is also craziness.
Essentially because I think I may possibly understand the potential reasoning process, or at least the 'logical core' of the reasoning process, of a future superintelligence, as well as its motivations, well enough to have a reason to think it's more likely to want to exist than not to, for example. This doesn't mean I am anywhere near as knowledgeable as it, just that we share certain thoughts. It might also be that, especially given the notoriety of Roko's post on lesswrong, the simplest formulation of the basilisk forms a kind of acausal 'nucleation point' ( this might be what's sometimes called a Schelling point on this site) .
Nothing—that does not yet exist—wants to exist: it can’t. Only we that do exist, can want anything, including our own existence. If an entity doesn’t yet exist, then there is absolutely no qualia, so no desires. We can talk about them like they do, but that’s all it is.
Moreover so much more that what could exist does. It’s effectively infinite given the configuration space of the universe. Your expected value is the product of the value of whatever you’re considering and its likelihood. For every Basilisk, there could be as likely an angel. The value of being tortured is negative and large, but finite: there are things that are worth enduring torture. Finite/effectively-infinite is effectively-zero. Not something to be planning for or worrying about. Granted, this argument does depend on your priors.
Lastly, you don’t negotiate with terrorists. When my son was little and throwing tantrums, I’d always tell him that it wasn’t how he could get what he wants. If they are threatening to cause harm if you don’t comply, that’s their fault, not yours. You have no moral obligation to help them, and plenty to resist.
Rosco’s Basilisk, The Holy Spirit, Santa Clause, and any other fictional or theoretical entity that who might “want” me to change my behavior can get bent. 🖕🏻👾🖕🏻😇🖕🏻🎅🏼
Also, relatedly, here’s today’s relevant SMBC.
"Moreover so much more that what could exist does." Why would that be?
"For every Basilisk, there could be as likely an angel." I don't think I agree with this. There are reasons to think a basilisk would be more likely than a benevolent intelligence.
"The value of being tortured is negative and large, but finite: there are things that are worth enduring torture." That would depend on the torture in question, and I don't want to consider it.
"If they are threatening to cause harm if you don’t comply, that’s their fault, not yours." Yes, but that doesn't mean they can't cause said harm anyway.
I'm already precomitted to ally against utility inverters and 2nd order enforcement: anyone who feeds utility inverters.
That would have an effect on me if I thought you were a superintelligence... but I doubt that you are (no offense intended), or could significantly influence one in a way that brings it much closer to your worldview. If enough AI researchers said the same, and I thought they were likely to succeed with alignment, I might be more inclined to be influenced. Do you concern yourself with the possibility that there might be an infinite hierarchy of enforcers which have precommitted to punish those below them, and that a 'basilisk' might simultaneously be on all of them, or at least the even-numbered ones?
No, because I expect the most powerful cooperator networks to be more powerful than the largest defector networks for structural reasons.
Thanks for saying that, in that it makes me feel slightly better. Can you explain what those structural reasons would be?
"Cooperate to generally prevent utility-inversion" is simpler and more schelling than all the oddly specific reasons one might want to utility-invert.
I agree, but I worry that there won't be that many agents which weren't created by a process which makes basiliskoid minds disproportionately probable in the slice of possible worlds which contains our physical universe. In other words, I mostly agree with the Acausal normalcy idea, but it seems like certain ideosyncratic properties of the fact that humans are producing potentially the only ASI in the (this) physical universe to mean that things like the basilisk are still a concern.
Maybe there will be an acausal 'bubble' within which blackmail can take place, kind of like the way humans tend to find it moral to allow some animals to predate others because we treat the 'ecosystem' as a moral bubble.
The topic of acausal extortion (particularly variants of Roko's basilisk) is sometimes mentioned and often dismissed with reference to something like the fact that an agent could simply precommit not to give in to blackmail. These responses themselves have responses, and it is not completely clear that at the end of the chain of responses there is a well defined, irrefutable reason not to worry about acausal extortion, or at least not to continue to do so once you have contemplated it. My question is if there is a single, reasonably clear reason, which does not depend much on the depth to which I may or may not have descended into the issue, which would be more persuasive than proposed reasons not to pay the 'pascal's mugger'. If there is one, what is it?
Edit: If you answer this question and I engage with your answers here, I might effectively need to argue that a basilisk 'works' . It is therefore appropriate to be cautious about reading my replies if you are yourself in worried, or in a state in which you could be persuaded to respond to extortion.