LESSWRONG
is fundraising!
LW

Are there any extremely strong arguments that Acausal extortion is ineffective? — LessWrong

17 Are there any extremely strong arguments that Acausal extortion is ineffective?

10th Jan 2026

1 min read

17

The topic of acausal extortion (particularly variants of Roko's basilisk) is sometimes mentioned and often dismissed with reference to something like the fact that an agent could simply precommit not to give in to blackmail. These responses themselves have responses, and it is not completely clear that at the end of the chain of responses there is a well defined, irrefutable reason not to worry about acausal extortion, or at least not to continue to do so once you have contemplated it. My question is if there is a single, reasonably clear reason, which does not depend much on the depth to which I may or may not have descended into the issue, which would be more persuasive than proposed reasons not to pay the 'pascal's mugger'. If there is one, what is it?

Edit: If you answer this question and I engage with your answers here, I might effectively need to argue that a basilisk 'works' . It is therefore appropriate to be cautious about reading my replies if you are yourself in worried, or in a state in which you could be persuaded to respond to extortion.

Acausal TradeRoko's BasiliskRationality

Personal Blog

17

New Answer

New Comment

9 Answers sorted by
top scoring

interstice

Jan 10, 2026

We seemingly have no idea what potential future extorters would want us to do. OK, you can imagine an AI that really wants to come into existence, and will torture you if you didn't help create it. But what if there's actually two AIs that want to come into existence, who each really hate the other, and AI B will torture you if you were helping AI A come into existence! Or maybe future humanity in some Everett branches will make a gazillion simulations of everyone so that most of their measure is there, and they'll punish/reward you for helping the basilisks! Or maybe....etc.

In reality, it's likely something weirder that no one anticipated will happen. The point is we have no idea what to expect, which makes threatening us pointless, since we don't know what action extorters would want us to take. If you think you have a good enough picture of the future that you do know, you're probably (very) overconfident.

clone of saturn

Jan 11, 2026

There's no objective answer to whether acausal extortion works or not, it's a choice you make. You can choose to act on thoughts about acausal extortion and thereby create the incentive to do acausal extortion, or not. I would recommend not doing that.

Raemon

Jan 10, 2026

Remember, the superintelligence doesn't actually want to spend these resources torturing you. The best deal for it is when it tricks you into thinking it's going to do that, and then, it doesn't.

You have to actually make different choices in a way where the superintelligence is highly confident that your decisionmaking was actually entangled with whether the superintelligence follows up on the threat.

And, this is basically just not possible.

You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between "it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips" vs "pretend it's going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn't tell the difference."

You could go out of your way to simulate or model the distribution of superintelligences in that much detail... but why would you do that? It's all downside at your current skill level.

(You claim you've thought about it enough to be worried. The amount of "thought about it" looks like doing math, or thinking through specific architecture, that includes as input "the amount you've thought about it" -> "your ability to model it's model of you" -> "you being able to tell that it can tell that you can tell whether it would actually follow through.")

If you haven't done anything that looked like doing math (as opposed to handwavy philosophy), you aren't anywhere close, and the AI knows this, and knows it doesn't actually have to spend any resources to extract value from you because you can't tell the difference.

...

A past round of argument about this had someone say "but, like, even if the probability that it'd be worth punishing me is small, it might still follow up on it. Are you saying it can drive the probability of me doing this below something crazy like 1/10^24?" and Nate Soares saying "Flatly: yes." It's a superintelligence. It knows you really had no way of knowing.

[-]Horosphere8h10

"And, this is basically just not possible. " I hope not.

"You do not have anywhere remotely high enough fidelity model of the superintelligence to tell the difference between "it can tell that it needs to actually torture you in the future in order to actually get the extra paperclips" vs "pretend it's going to it <in your simulation>, and then just not actually burn the resources because it knows you couldn't tell the difference."

My concern is that I might not need high fidelity.

"If you haven't done anything that looked like doing math (as opposed to handwavy philosophy), you aren't anywhere close, and the AI knows this, and knows it doesn't actually have to spend any resources to extract value from you because you can't tell the difference."

I hope you're correct about that, but I would like to know why you are confident about that. Eliezer Yudkowsky suggested that it would be rational to cooperate with a paperclip maximizer^[1] from another universe in a one-shot prisoners' dilemma. This tells me that someone really intelligent (for a human) thinks that fidelity on its own is not enough to preclude acausal trade so why should it preclude acausal ... (read more)

tailcalled

Jan 10, 2026

Acausal extortion works to the extent someone spends a lot of time thinking about who might want to extort them and commits a lot of resources to helping them. Few people are likely to do so, because it makes them targets for acausal extortion for no good reason. Since few people let themselves be targets for it, it doesn't work.

The main problem with this argument is that if someone is neurotically committed to making themselves a target for it, it doesn't show that acausal extortion won't work against them, only that it probably won't work against most other people.

[-]Horosphere13h10

The problem is that I worry that I have thought about the situation in enough depth that I am likely to be targeted, even if I don't 'cooperate'.

2Dagon9h

It's worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life - if the gods will it, it will happen, if they don't, it won't) based on the two main actions you're considering actually performing. For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it's significant.

1Horosphere9h

"I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it's significant." Why? What justifies your infinitesimal value?

1Horosphere9h

I find it very difficult to estimate probabilities like this, but I expect the difference in the probability of something significant happening if I do something in response to the basilisk and the probability of that happening if I don't, is almost certainly in excess of 1/1000 or even 1/100. This is within the range where I think it makes sense to take it seriously. (And this is why I asked this question.)

2Dagon9h

I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events: 1) enough of my brain-state is recorded that I COULD be resurrected 2) the imagined god finds it worthwhile to simulate me 3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation. 4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe. 5) no other gods have better things to do with the resources, and stop the angry one from wasting time. Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you're actually dead and gone) to punish YOU specifically, it still doesn't make it likely at all.

1Horosphere8h

You're imagining a very different scenario from me. I worry that:

2tailcalled12h

It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online. Have you thought in enough depth that you've helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.

1Horosphere11h

My fear isn't about people doing it ; I'm more worried about an ASI. I'm sure one would have no shortage of computational capacity to expend thinking through my own thoughts.

Vladimir_Nesov

Jan 10, 2026

I think the popular version of this worry is Prisoner's Dilemma shaped, where someone else (not just you) might make an ASI that extorts others (including you) who didn't contribute to its construction. So it's a coordination problem, which is generally a worrisome thing. It's somewhat silly because to get into the Prisoner's Dilemma shape (where the issue would then be coordination to avoid building the extortion ASI), you first need to coordinate with everyone on the stipulation that the potential ASIs getting built must be the extortion ASIs in particular, not other kinds of ASIs (which is a difficult coordination problem, intentionally targeting a weirdly menacing outcome, which should make it even more difficult as a coordination problem). So there is a coordination problem aspect that would by itself be worth worrying about (Prisoner's Dilemma among human builders or contributors), but it gets defeated by another coordination problem (deciding to only build extortion ASIs from the outset, if any ASIs are going to be built at all).

In the real world, Nature and human nature might've already coordinated the potential ASIs getting built (on current trajectory, that is soon and without an appropriate level of preparation and caution) to have a significant probability to kill everyone. So weirdly enough, silly hypothetical coordination to only build extortion ASIs might find the real world counterpart in implicit coordination to only build potentially omnicidal ASIs, which are even worse than extortion ASIs. Since they don't spare their builder, it's not a Prisoner's Dilemma situation (you don't win more by building the ASIs, if others ban/pause ASIs for the time being), so it should be easier to ban/pause potentially omnicidal ASIs than it would be to ban/pause extortion ASIs. But the claim that ASIs built on current trajectory with anything resembling the current methods are potentially omnicidal (given the current state of knowledge about how they work and what happens if you build them) is for some reason insufficiently obvious to everyone. So coordination still appears borderline infeasible in the real world, at least until something changes, such as another 10-20 years passing without AGI, bringing a cultural shift, perhaps due to widespread job displacement after introduction of continual learning LLMs that still fail to gain general RL competence and so don't pose an AGI-level threat.

[-]Horosphere10h10

I don't think this comment touches upon the actual reason why I expect a 'basilisk' to possibly exist. It seems like you believe that it's possible to (collectively) chose whether or not to build an ASI with the predispositions of the basilisk, which might have been the premise of the original basilisk post, but what worries me more than this is the possibility that a future ASI wants current humans to accelerate its creation, or more likely still, maximize the probability of its existence. This seems like a predictable preference for an AI to have.

2Vladimir_Nesov9h

That doesn't imply extortion, especially s-risk extortion. (I didn't intend s-risk extortion as the meaning of extortion ASI in my comment above, just any sort of worse outcomes to set up a blackmail kind of Prisoner's Dilemma.) So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. I still don't see what essential role acausal coordination would play in any of this, hence the setup I sketched above, with Prisoner's Dilemma among mere humans, and ASIs that could just look at the physical world once they are built, in a perfectly causal manner. (Substitute my use of mere extortion ASIs with s-risk extortion ASIs, or my use of omnicidal ASIs with unconditional s-risk ASIs, if that makes it easier to parse and extract the point I'm trying to make. I don't think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes.)

1Horosphere8h

"So in your mind the counterpart to lethal misalignment ASI by default is s-risk extortion ASI by default. " Possibly. "I don't think the arguments about decision making here depend on talking about s-risk as opposed to more mundane worse outcomes." I agree. It seems like you are not aware of the main reason to expect acausal coordination here. Maybe I shouldn't tell you about it...

3Vladimir_Nesov8h

Coordination not to build wouldn't help (even if successful), you can't defeat an abstract entity, prevent it from doing something in its own abstract world, by preventing existence of its instances in the physical world (intentionally or not), and it can still examine everyone's motivations and act accordingly. I just suspect that the step of actually building it is a major component of anxiety this seems to produce in some people. Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal's wager (not mugging). There are too many possible abstract entities that act in all sorts of ways in response to all sorts of conditions to make it possible to just point at one of them and have it notice this in an important way. Importance of what happens with all possible abstract entities has to be divided among them, and each of them only gets a little, cashing out as influence of what happens with the entity on what you should do. So I don't think there is any reason to expect that any particular arbitrarily selected abstract bogeyman is normatively important for your decision making, because there are all the other abstract bogeymen you are failing to consider. And when you do consider all possible abstract bogeymen, it should just add up to normality.

1Horosphere8h

"Without the step where an extortion ASI actually gets built, this seems closely analogous to Pascal's wager (not mugging). " The problem is, I expect it to be built, and I expect being built to be something instrumentally valuable to it in a way which cannot be inverted without making it much less likely, whereas the idea of a god who would punish those who don't think it exists can be inverted.

2Vladimir_Nesov8h

Then that is a far more salient issue than any acausal blackmail it might have going in its abstract form, which is the only thing that happens in the outcomes where it doesn't get built (and where it remains unimportant). This just illustrates how the acausal aspects of any of this don't seem cruxy/relevant, and why I wrote the (top level) answer above the way I did, getting rid of anything acausal from the structure of the problem (other than what acausal structure remains in ordinary coordination among mere humans, guided by shared/overlapping abstract reasons and explanations).

1Horosphere7h

I don't think I can prevent it from being created. But I do have some ability to influence whether it has an acausal incentive to hurt me (if in fact it has one).

2Vladimir_Nesov7h

If you can't affect creation of an extortion ASI, then you can't affect its posited acausal incentives either, since these things are one and the same. Within the hypothetical of expecting likely creation of an extortion ASI, what it does and why is no longer unimportant, Pascal's wager issues no longer apply. Though it still makes sense to remain defiant (to the extent you do have the ability to affect the outcomes), feeding the principle that blackmail works more rarely and that there's coordination around defying it, maintaining integrity of the worlds that (as a result) remain less affected by its influence.

[-]Horosphere12h10

Thanks for this comment, I will have to think about this before I decide what to make of it.

Karl Krueger

Jan 10, 2026

In Bostrom's formulation of Pascal's mugging, Pascal incorrectly limits the possibilities to two:

The mugger just runs off with his money. (High probability, small negative utility.)
The mugger really is a benevolent magic being, and blesses Pascal with 1,000 quadrillion years of additional happy life. (Very low probability, very big positive utility.)

But Pascal is wrong to ignore the third possibility that the mugger really is a magic being, but a malevolent one, who will curse Pascal with 1,000 quadrillion years of torture and then kill him. (Very low probability, very big negative utility.)

The mugger doesn't mention this possibility, but Pascal is mistaken to not consider it.

Pascal's credence in the mugger's malice and deceit should be at least as strong as his credence in the mugger's benevolence and truthfulness. And so, this possibility cancels out the positive expected utility from the possibility that the mugger does mention.

There is a large space of such fantasy possibilities, all of which are about as likely as the mugger's claim. It is a mistake to privilege one of them (benevolent magic being) over all the countless others.

There are also plenty that are much more likely, such as "the mugger uses Pascal's money to go buy a gun, then comes back and robs Pascal's house too, because why rob a sucker once when you can rob him twice (and lay the blame on him for enabling you to do it)?"

[-]Horosphere13h10

I would not dispute that this is a reasonable response to the scenario in that thought experiment, but in the case of Acausal Blackmail scenarios like Roko's basilisk, the symmetry between positive and negative possible outcomes of cooperating with the basilisk is broken by our understanding that it is likely to come into existence and want to exist.

3Karl Krueger12h

It's still a mistake to privilege a particular fantasy mugger god story over all other fantasy mugger god stories. You are being acausally-mugged in every direction, all at once, all the time, forever. If one FMG tells you to do action A right now, well, if you did that, you'd be disregarding all other FMGs that tell you to do B, C, D, etc. right now. You cannot possibly comply with all the myriad demands of all possible FMGs; you certainly can't do so proportional to those FMGs' chance of realness; nor can you a-priori discern which FMGs are realer than others with sufficient precision to generate an optimal course of action. The space of FMGs is too big and the mapping to their preferred actions is too intractable. (And no, I'm not sure we can even discount FMGs who would, if created, regret their own existence. They might well be outnumbered by FMGs who want to exist — but perhaps their preference for nonexisting is much, much stronger. Some FMGs are miserable bastards, like AM in "I Have No Mouth And I Must Scream". Please don't build one.)

1Horosphere12h

I agree that calculating the precise value of the 'utility function' is computationally unfeasible, but this does not mean it can't be approximated, or that any attempt to reason about acausal things is necessarily futile. I think your argument proves too much; it could be used to justify rejection of any Timeless decision theory, or even perhaps utilitarianism, because precisely evaluating a utility function, especially if it involves acausal' influence' , is combinatorially explosive. Although I don't understand it in depth, I have heard that it is possible to approximate infinite integrals over possible sequences of events involving particles following different paths in Quantum Field Theory, and that this often yeilds useful approximations to reality even when inifnity enter into the calculation. In my opinion, a similar process is likely to be possible in a functional/logical/mathematical decision theory.

2Karl Krueger12h

It's not just combinatorial explosion; it's also chaos. How do you get an FMG? Write a blog-post story of a god; figure out what that god would want you to do; then do that. But two stories that are nearby in story-space can generate action recommendations that are wildly different or even opposed. The parts of FMG-space that deviate from conventional ethics & epistemology offer no guidance because they diverge into chaos.

1Horosphere10h

"The parts of FMG-space that deviate from conventional ethics & epistemology offer no guidance because they diverge into chaos." Wouldn't that suggest that logical decision theories give us almost no new knowledge? How do you justify this claim?

2Karl Krueger10h

No, decision theories just don't give us free a-priori perfect knowledge of the precise will of a vengeful & intolerant god we just made up for a story. They're still fine for real world situations like keeping your promises to other people.

1Horosphere10h

What you're saying reminds me a lot of another LessWrong user I conversed with on this topic, who claimed that Acausal communication couldn't possibly work, but I have to disagree: just because information, as in, data, isn't transferred in the traditional way via causal channels between a future ASI and a current human, does not imply that acausal trade/blackmail can never work in principle, because they don't work by causal means. "No, decision theories just don't give us free a-priori perfect knowledge of the precise will of a vengeful & intolerant god we just made up for a story." I feel your exagerration of what I claimed is reaching a point of departure from representing it well enough to be interchangeable for the purpose of this discussion. I didn't claim perfect knowledge of an ASI's mind (and it wouldn't exactly be a god) . "They're still fine for real world situations like keeping your promises to other people." Your use of the phrase "real world situations" suggests that you've presupposed that this kind of thing can't happen... but I don't see why it can't.

1Horosphere11h

I'm not sure that applies to Roko's basilisk; as I've mentioned elsewhere, there are particular reasons to think it would be more likely to want some things than others. Yes, maybe there's an element of chaos, but that doesn't prevent there being a rational way to act in response to the possibility of acausal blackmail. And maybe that way is to give in. Can you see a good reason why it isn't? A reason robust to descending a long way into the logical mire surrounding the thought experiment?

Dagon

Jan 10, 2026*

First, a generalized argument about worrying. It’s not helpful, it’s not an organized method of planning your actions or understanding the world(s). OODA (observe, orient, decide, act) is a better model. Worry may have a place in this, as a way to remember and reconsider factors which you’ve chosen not to act on yet, but it should be minor.

Second, an appeal to consequentialism - it’s acausal, so none of your acts will change it. edit: The basilisk/mugging case is one-way causal - your actions matter, but the imagined blackmailer's actions cannot change your behavior. If you draw a causal graph, there is no influence/action arrow that leads them to follow-through on the imagined threat.

[-]Vladimir_Nesov13h30

it’s acausal, so none of your acts will change it

If it reasons about you, your acts determine its conclusions. If your acts fail to determine its conclusions, it failed to reason about you correctly. You can't change the conclusions, but your acts are still the only thing that determines them.

The same happens with causal consequences (physical future). They are determined by your acts in the past, but you can't change the future causal consequences, since if you determine them in a certain way, they therefore were never actually different from what you've determined them to be, there was never something to change them from.

[-]Horosphere13h10

"First, a generalized argument about worrying." I meant an argument for why the idea is not sufficiently concerning that it could explain why a rational being would worry, or equivalently, an argument for why acausal extortion 'does not work'. I have now changed the title to clarify this.

"Second, an appeal to consequentialism - it’s acausal, so none of your acts will change it."

Within causal decision theory this is true, but if it were true in general then acausal decision theory would be pointless(in my opinion, as an approximate consequentialist ). The reason why I don't agree with that statement hinges on what I am: If I considered myself to be a single instantiation of a brain in one particular part of an individual physical universe, I would agree, but I think it is more appropriate to consider myself a pattern which is distributed throughout different parts of a platonic/logical/mathematical universe. This means that it's certainly possible for one instance of me to influence something which is completely causally disconnected from another one.

2Dagon9h

Acausal decision theory is pointless, sure. Are there any? TDT and FDT are distict from CTD, but they're not actually acausal, just more inclusive of causality of decisions. CDT is problematic only because it doesn't acknowledge that the decisions being made themselves have causes and constraints.

1Horosphere9h

"TDT and FDT are distict from CTD, but they're not actually acausal, just more inclusive of causality of decisions." I agree that the term 'acausal' is misleading; I take it to refer to anything which takes the possibility of being instantiated in different parts of a 'platonic /mathematical universe' into account. That CDT as it's usually referred to does not is the main reason why I find it problematic and why it doesn't allow an agent to profit in Newcomb's problem.

jbash

Jan 10, 2026

0-6

Well, I dont' worry about acausal extortion because I think all that "acausal" stuff is silly nonsense to begin with.

I very much recommend this approach.

Take Roko's basilisk.

You're afraid that entity A, which you don't know will exist, and whose motivations you don't understand, may find out that you tried to prevent it from coming into existence, and choose to punish you by burning silly amounts of computation to create a simulacrum of you that may experience qualia of some kind, and arranging for those qualia to be aversive. Because A may feel it "should" act as if it had precommitted to that. Because, frankly, entity A is nutty as a fruitcake.

Why, then, are you not equally afraid that entity B, which you also don't know will exist, and whose motivations you also don't understand, may find out that you did not try to prevent entity A from coming into existence, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Because B may feel it "should" act as if it had precommitted to that.

Why are you not worried that entity C, which you don't know will exist, and whose motivations you don't understand, may find out that you wasted time thinking about this sort of nonsense, and choose to punish you by burning silly amounts of computation to create one or more simulacra of you that may experience qualia of some kind, and arranging for those qualia to be aversive? Just for the heck of it.

Why are you not worried that entity D, which you don't know will exist, and whose motivations you don't understand, may find out that you wasted time thinking about this sort of nonsense, and choose to reward you by burning silly amounts of computation to create a one or more simulacra that may experience qualia of some kind, and giving them coupons for unlimited free ice cream? Because why not?

Or take Pascal's mugging. You propose to give the mugger $100, based either on a deeply incredible promise to give you some huge amount of money tomorrow, or on a still more incredible promise to torture a bunch more simulacra if you don't. But surely it's much more likely that this mugger is personally scandalized by your willingness to fall for either threat, and if you give the mugger the $100, they'll come back tomorrow and shoot you for it.

There are an infinite number of infinitessimally probable outcomes, far more than you could possibly consider, and many of them things that you couldn't even imagine. Singling out any of them is craziness. Trying to guess at a distribution over them is also craziness.

[-]Horosphere13h00

Essentially because I think I may possibly understand the potential reasoning process, or at least the 'logical core' of the reasoning process, of a future superintelligence, as well as its motivations, well enough to have a reason to think it's more likely to want to exist than not to, for example. This doesn't mean I am anywhere near as knowledgeable as it, just that we share certain thoughts. It might also be that, especially given the notoriety of Roko's post on lesswrong, the simplest formulation of the basilisk forms a kind of acausal 'nucleation point' ( this might be what's sometimes called a Schelling point on this site) .

Ustice

Jan 10, 2026

-10

Nothing—that does not yet exist—wants to exist: it can’t. Only we that do exist, can want anything, including our own existence. If an entity doesn’t yet exist, then there is absolutely no qualia, so no desires. We can talk about them like they do, but that’s all it is.

Moreover so much more that what could exist does. It’s effectively infinite given the configuration space of the universe. Your expected value is the product of the value of whatever you’re considering and its likelihood. For every Basilisk, there could be as likely an angel. The value of being tortured is negative and large, but finite: there are things that are worth enduring torture. Finite/effectively-infinite is effectively-zero. Not something to be planning for or worrying about. Granted, this argument does depend on your priors.

Lastly, you don’t negotiate with terrorists. When my son was little and throwing tantrums, I’d always tell him that it wasn’t how he could get what he wants. If they are threatening to cause harm if you don’t comply, that’s their fault, not yours. You have no moral obligation to help them, and plenty to resist.

Rosco’s Basilisk, The Holy Spirit, Santa Clause, and any other fictional or theoretical entity that who might “want” me to change my behavior can get bent. 🖕🏻👾🖕🏻😇🖕🏻🎅🏼

Also, relatedly, here’s today’s relevant SMBC.

[-]Horosphere10h10

"Moreover so much more that what could exist does." Why would that be?

"For every Basilisk, there could be as likely an angel." I don't think I agree with this. There are reasons to think a basilisk would be more likely than a benevolent intelligence.

"The value of being tortured is negative and large, but finite: there are things that are worth enduring torture." That would depend on the torture in question, and I don't want to consider it.

"If they are threatening to cause harm if you don’t comply, that’s their fault, not yours." Yes, but that doesn't mean they can't cause said harm anyway.

6 comments, sorted by

top scoring

Click to highlight new comments since: Today at 5:48 AM

[-]romeostevensit11h126

I'm already precomitted to ally against utility inverters and 2nd order enforcement: anyone who feeds utility inverters.

[-]Horosphere10h10

That would have an effect on me if I thought you were a superintelligence... but I doubt that you are (no offense intended), or could significantly influence one in a way that brings it much closer to your worldview. If enough AI researchers said the same, and I thought they were likely to succeed with alignment, I might be more inclined to be influenced. Do you concern yourself with the possibility that there might be an infinite hierarchy of enforcers which have precommitted to punish those below them, and that a 'basilisk' might simultaneously be on all of them, or at least the even-numbered ones?

[-]romeostevensit10h84

No, because I expect the most powerful cooperator networks to be more powerful than the largest defector networks for structural reasons.

[-]Horosphere10h10

Thanks for saying that, in that it makes me feel slightly better. Can you explain what those structural reasons would be?

[-]Raemon9h92

"Cooperate to generally prevent utility-inversion" is simpler and more schelling than all the oddly specific reasons one might want to utility-invert.

[-]Horosphere9h10

I agree, but I worry that there won't be that many agents which weren't created by a process which makes basiliskoid minds disproportionately probable in the slice of possible worlds which contains our physical universe. In other words, I mostly agree with the Acausal normalcy idea, but it seems like certain ideosyncratic properties of the fact that humans are producing potentially the only ASI in the (this) physical universe to mean that things like the basilisk are still a concern.

Maybe there will be an acausal 'bubble' within which blackmail can take place, kind of like the way humans tend to find it moral to allow some animals to predate others because we treat the 'ecosystem' as a moral bubble.

Moderation Log

17

[ Question ]

Are there any extremely strong arguments that Acausal extortion is ineffective?

17

17

9 Answers sorted by top scoring

Jan 10, 2026

Jan 11, 2026

Jan 10, 2026

Jan 10, 2026

Jan 10, 2026

Jan 10, 2026

Jan 10, 2026*

Jan 10, 2026

Jan 10, 2026

9 Answers sorted by
top scoring