This is our monthly thread for collecting arbitrarily contrived scenarios in which somebody gets tortured for 3^^^^^3 years, or an infinite number of people experience an infinite amount of sorrow, or a baby gets eaten by a shark, etc. and which might be handy to link to in one of our discussions. As everyone knows, this is the most rational and non-obnoxious way to think about incentives and disincentives.

  • Please post all infinite-torture scenarios separately, so that they can be voted up/down separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
  • No more than 5 infinite-torture scenarios per person per monthly thread, please.

New to LessWrong?

New Comment
139 comments, sorted by Click to highlight new comments since: Today at 3:17 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Humanity grows saner, SIAI is well funded, and successfully develops a FAI. Just as the AI finishes calculating the CEV of humanity, a stray cosmic ray flips the sign bit in its utility function. It proceeds to implement the anti-CEV of humanity for the lifetime of the universe.

(Personally I think contrived-ness only detracts from the raw emotional impact that such scenarios have if you ignore their probability and focus on the outcome.)

I actually use "the sign bit in the utility function" as one of my canonical examples for how not to design an AI.


As everyone knows, this is the most rational and non-obnoxious way to think about incentives and disincentives.

In all seriousness, coming up with extreme, contrived examples is a very good way to test the limits of moral criteria, methods of reasoning, etc. Oftentimes if a problem shows up most obviously at the extreme fringes, it may also be liable, less obviously, to affect reasoning in more plausible real-world scenarios, so knowing where a system obviously fails is a good starting point.

Of course, we're generally relying on intuition to determine what a "failure" is (many people would hear that utilitarianism favours TORTURE over SPECKS and deem that a failure of utilitarianism rather than a failure of intuition), so this method is also good for probing at what people really believe, rather than what they claim to believe, or believe they believe. That's a good general principle of reverse engineering — if you can figure out where a system does something weird or surprising, or merely what it does in weird or surprising cases, you can often get a better sense of the underlying algorithms. A person unfamiliar with the terminology of moral philosophy might not know w... (read more)


If you ask me, the prevalence of torture scenarios on this site has very little to do with clarity and a great deal to do with a certain kind of autism-y obsession with things that might happen but probably won't.

It's the same mental machinery that makes people avoid sidewalk cracks or worry their parents have poisoned their food.

A lot of times it seems the "rationality" around here simply consists of an environment that enables certain neuroses and personality problems while suppressing more typical ones.

I don't think that being fascinated by extremely low-probability but dramatic possibilities has anything to do with autism. As you imply, people in general tend to do it, though being terrified about airplane crashes might be a better example.

I'd come up with an evolutionary explanation, but a meteor would probably fall on my head if I did that.


I really don't see how you could have drawn that conclusion. It's not like anyone here is actually worried about being forced to choose between torture and dust specks, or being accosted by Omega and required to choose one box or two, or being counterfactually mugged. (And, if you were wondering, we don't actually think Clippy is a real paperclip maximizer, either.) "Torture" is a convenient, agreeable stand-in for "something very strongly negatively valued" or "something you want to avoid more than almost anything else that could happen to you" in decision problems. I think it works pretty well for that purpose.

Yes, a recent now-deleted post proposed a torture scenario as something that might actually happen, but it was not a typical case and not well-received. You need to provide more evidence that more than a few people here actually worry about that sort of thing, that it's more than just an Omega-like abstraction used to simplify decision problems by removing loopholes around thinking about the real question.

How about this guy a couple comments down? Actually I'm not sure he's even serious, but I've certainly seen that argument advanced before, and the parent post's "1% chance" thing is I'm pretty sure a parody of the idea that you have to give anything at least a 1% chance, because it's all so messy and how can you ever be sure?! which has certainly shown up on this site on several occasions recently, particularly in relation to the very extreme fringe scenarios you say help people think more clearly. Torture scenarios have LONG being advanced in this community as more than a trolly problem with added poor-taste hyperbole. Even if you go back to the SL4 mailing list, it's full of discussions where someone says something about AI, and someone else replies "what, so an AI is like god in this respect? What if it goes wrong? What if religious people make one? What if my mean neighbor gets uploaded? What if what if what if? WE'LL ALL BE TORTURED!"
I was serious that the probability of an explicitly described situation is orders of magnitude greater than the probability of a not-yet-chosen random scenario. In the same way the probability of any particular hundred digit number, once someone posts it online, will be orders of magnitude more likely to appear elsewhere. But I was joking in the "complaint" about the posting, because the probability both before and after the post is small enough that no reasonable person could worry about the thing happening.
This isn't consistent with Roko's post here which took seriously the notion of a post Singularity FAI precommitting to torturing some people for eternity. ETA: Although it does seem that part of the reason that post was downvoted heavily was that most people considered the situation ridiculous.
Pondering blue tentacle scenarios can be a displacement activity, to avoid dealing with things that really might happen, or be made to happen, here and now.
Could you give an example of how extreme examples inform realistic examples? Is there evidence that people who advocate deontology or consequentialism in one place do so in the other?
I think paradoxes/extreme examples work mainly by provoking lateral thinking, forcing us to reconsider assumptions, etc. It has nothing at all to do with the logical system under consideration. Sometimes we get lucky and hit upon an idea that goes further and with less exceptions, other times we don't. In short, it's all in the map, not in the territory. I don't believe in absolute consistency (whether in morality or even in say, physics). A theory is an algorithm that works. We should be thankful that it does at all. In something like morality, I don't expect there to be a possible systematization of it. We will only know what is moral in the far future in the only-slightly-less-far future. Self-modification has no well-defined trajectory. Theories of the known, which are described by different physical ideas may be equivalent in all their predictions and are hence scientifically indistinguishable. However, they are not psychologically identical when trying to move from that base into the unknown. For different views suggest different kinds of modifications which might be made and hence are not equivalent in the hypotheses one generates from them in ones attempt to understand what is not yet understood. --Feynman

Omega comes and asks you to choose between one person being tortured, and 3^^^3 people receiving one dust speck in their eyes each. You choose dust specks. 1/10^100 of those 3^^^3 people were given the same choice, and 1/10^(10^10) of those were effectively clones of you, so each person gets 3^^^3/(10^(100+10^10)) dust specks, and falls into a black hole.

(To borrow some D&D parlance, you should expect anything that affects the universe in 3^^^3 ways to also cause a wild surge.)


Build a sufficiently advanced computer able to simulate all the Pre-Singularity human lives and render those lives as transforms applicable to a model of a human consciousness. Apply the transforms sequentially to a single consciousness, allowing only brief lucid moments between applications. Voila, the sum of all human suffering.

Then allow the resulting individual to lecture to all the Post-Singularity minds on how bad things used to be back in the day.


"You post-Singularity kids don't know how well you have it! Back in my day we had to walk uphill both ways to all possible schools in all possible universes! And we did it barefoot, in moccasins, with boots that had holes and could defy gravity! In the winter we had to do it twice a day -- and there was no winter!"

In the comment section of Roko's banned post, PeerInfinity mentioned "rescue simulations". I'm not going to post the context here because I respect Eliezer's dictatorial right to stop that discussion, but here's another disturbing thought.

An FAI created in the future may take into account our crazy desire that the all the suffering in the history of the world hadn't happened. Barring time machines, it cannot reach into the past and undo the suffering (and we know that hasn't happened anyway), but acausal control allows it to do the next best thing: create large numbers of history sims where bad things get averted. This raises two questions: 1) if something very bad is about to happen to you, what's your credence that you're in a rescue sim and have nothing to fear? 2) if something very bad has already happened to you, does this constitute evidence that we will never build an FAI?

(If this isn't clear: just like PlaidX's post, my comment is intended as a reductio ad absurdum of any fears/hopes concerning future superintelligences. I'd still appreciate any serious answers though.)

This falls in the same confused cluster as anticipated experience. You only anticipate certain things happening because they describe the fraction of the game you value playing and are able to play (plan for), over other possibilities where things go crazy. Observations don't provide evidence, and how you react to observations is a manner in which you follow a plan, conditional strategy of doing certain things in response to certain inputs, a plan that you must decide on from other considerations. Laws of physics seem to be merely a projection of our preference, something we came to value because we evolved to play the game within them (and are not able to easily influence things outside of them).

So "credence" is a very imprecise idea, and certainly not something you can use to make conclusions about what is actually possible (well, apart from however it reveals your prior, which might be a lot). What is actually possible is all there in the prior, not in what you observe. This suggests a kind of "anti-Bayesian" principle, where the only epistemic function of observations is to "update" your knowledge about what your prior actually is, but this "updating" is not at all straightforward. (This view also allows to get rid of the madness in anthropic thought experiments.)

(This is a serious response. Honest.)

Edit: See also this clarification.

Disagree, but upvoted. Given that there's a canonical measure on configurations (i.e. the one with certain key symmetries, as with the L^2 measure on the Schrödinger equation), it makes mathematical sense to talk about the measure of various successor states to a person's current experience. It is true that we have an evolved sense of anticipated experience (coupled with our imaginations) that matches this concept, but it's a nonmysterious identity: an agent whose subjective anticipation matches their conditional measure will make more measure-theoretic optimal decisions, and so the vast majority of evolved beings (counting by measure) will have these two match. It may seem simpler to disregard any measure on the set of configurations, but it really is baked into the structure of the mathematical object.
Do we still have a disagreement? If we do, what is it?
I think that the mathematical structure of the multiverse matters fundamentally to anthropic probabilities. I think it's creative but wrong to think that an agent could achieve quantum-suicide-level anthropic superpowers by changing how much ve now cares about certain future versions of verself, instead of ensuring that only some of them will be actual successor states of ver patterns of thought. However, my own thinking on anthropic probabilites (Bostromian, so far as I understand him) has issues†, so I'm pondering it and reading his thesis. † In particular, what if someone simulates two identical copies of me simultaneously? Is that different from one copy? If so, how does that difference manifest itself in the gray area between running one and two simulations, e.g. by pulling apart two matching circuitboards running the pattern?
You can't change your preference. The changed preference won't be yours. What you care about is even more unchangeable than reality. So we don't disagree here, I don't think you can get anthropic superpowers, because you care about a specific thing.
If we lump together even a fraction of my life as "me" rather than just me-this-instant, we'd find that my preference is actually pretty malleable while preserving the sense of identity. I think it's within the realm of possibility that my brain could be changed (by a superintelligence) to model a different preference (say, one giving much higher weight to versions of me that win each day's lottery) without any changes more sudden or salient to me than the changes I've already gone through. If I expected this to be done to me, though, I wouldn't anticipate finding my new preference to be well-calibrated; I'd rather expect to find myself severely surprised/disappointed by the lottery draw each time. Am I making sense in your framework, or misunderstanding it?
I am still puzzled how preference corresponds to the physical state of brain. Is preference only partially presented in our universe (intersection of set of universes which correspond to your subjective experience and set of universes which correspond to mine subjective experience)?
I don't say that the nature of the match is particularly mysterious, indeed measure might count as an independent component of the physical laws as explanation for the process of evolution (and this might explain Born's rule). But decision-theoretically, it's more rational to look at what your prior actually is, rather than at what the measure in our world actually is, even if the two very closely match. It's the same principle as with other components of evolutionary godshatter, but anticipation is baked in most fundamentally. You don't discard measure at human level, it's a natural concept that captures a lot of structure of our preference, and so something to use as a useful heuristic in decision-making, but once you get to be able to work at the greater level of detail, physical laws or measures over the structures that express them cease to matter.
Whoa. That's gotta be the most interesting comment I read on LW ever. Did you just give an evolutionary explanation for the concept of probability? If Eliezer's ideas are madness, yours are ultimate madness. It does sound like it can be correct, though. But I don't see how it answers my question. Are you claiming I have no chance of ending up in a rescue sim because I don't care about it? Then can I start caring about it somehow? Because it sounds like a good idea.
It is much worse, this seems to be an evolutionary "explanation" for, say, particle physics, and I can't yet get through the resulting cognitive dissonance. This can't be right.
Yep, I saw the particle physics angle immediately too, but I saw it as less catastrophic than probability, not more :-) Let's work it out here. I'll try to think of more stupid-sounding questions, because they seemed to be useful to you in the past.
As applied to your comment, it means that you can only use observations epistemically where you expect to be existing according to the concept of anticipated experience as coded by evolution. Where you are instantiated by artificial devices like rescue simulations, these situations don't map on anticipated experience, so observations remembered in those states don't reveal your prior, and can't be used to learn how things actually are (how your prior actually is). You can't change what you anticipate, because you can't change your mind that precisely, but changing what you anticipate isn't fundamental and doesn't change what will actually happen - everything "actually happens" in some sense, you just care about different things to different degree. And you certainly don't want to change what you care about (and in a sense, can't: the changed thing won't be what you care about, it will be something else). (Here, "caring" is used to refer to preference, and not anticipation.)
Before I dig into it formally, let's skim the surface some more. Do you also think Rolf Nelson's AI deterrence won't work? Or are sims only unusable on humans?
I think this might get dangerously close to the banned territory, and our Friendly dictator will close the whole thread. Though since it wasn't clarified what exactly is banned, I'll go ahead and discuss acausal trade in general until it's explicitly ruled banned as well. As discussed before, "AI deterrence" is much better thought of as participation in acausal multiverse economy, but it probably takes a much more detailed knowledge of your preference than humans possess to make the necessary bead jar guesses to make your moves in the global game. This makes it doubtful that it's possible on human level, since the decision problem deteriorates into a form of Pascal's Wager (without infinities, but with quantities outside the usual ranges and too difficult to estimate, while precision is still important). ETA: And sims are certainly "usable" for humans, they produce some goodness, but maybe less so than something else. That they aren't subjectively anticipated, doesn't make them improbable, in case you actually build them. Subjective anticipation is not a very good match for prior, it only tells you a general outline, sometimes in systematic error.
If you haven't already, read BLIT. I'm feeling rather like the protagonist. Every additional angle, no matter how indirect, gets me closer to seeing that which I Must Not Understand. Though I'm taking it on faith that this is the case, I have reason to think the faith isn't misplaced. It's a very disturbing experience. I think I'll go read another thread now. Or wait, better yet, watch anime. There's no alcohol in the house..
I managed to parse about half of your second paragraph, but it seems you didn't actually answer the question. Let me rephrase. You say that sims probably won't work on humans because our "preference" is about this universe only, or something like that. When we build an AI, can we specify its "preference" in a similar way, so it only optimizes "our" universe and doesn't participate in sim trades/threats? (Putting aside the question whether we want to do that.)
This has been much discussed on LW. Search for "updateless decision theory" and "UDT."
I don't believe that anticipated experience in natural situations as an accidental (specific to human psychology) way for eliciting prior was previously discussed, though general epistemic uselessness of observations for artificial agents is certainly an old idea.
This is crazy.
Yes, quite absurd.
I'd give that some credence, though note that we've talking about subjective anticipation, which is a piece of humanly-compelling nonsense.
For me, essentially zero, that is I would act (or attempt to act) as if I had zero credence that I was in a rescue sim.
Your scenario is burdened by excessive detail about FAI. Any situation in which people create lots of sims but don't allow lots of suffering/horror in the sims (perhaps as "rescue sims," perhaps because of something like animal welfare laws, or many other possibilities) poses almost the same questions.
I thought about the "burdensome details" objection some more and realized that I don't understand it. Do you think the rescue sim idea would work? If yes, the FAI should either use it to rescue us, or find another course of action that's even better - but either way we'd be saved from harm, no? If the FAI sees a child on a train track, believing that the FAI will somehow rescue it isn't "burdensome detail"! So you should either believe that you'll be rescued, or believe that rescue sims and other similar scenarios don't work, or believe that we won't create FAI.
The plan that's even better won't be about "rescuing the child" in particular, and for the same reason you can't issue specific wishes to FAI, like to revive the cryopreserved.
But whatever the "better plan" might be, we know the FAI won't leave the child there to die a horrible death. To borrow Eliezer's analogy, I don't know which moves Kasparov will make, but I do know he will win.
It's not a given that rescuing the child is the best use of one's resources. As a matter of heuristic, you'd expect that, and as a human, you'd form that particular wish, but it's not obvious that even such heuristics will hold. Maybe something even better than rescuing the child can be done instead. Not to speak of the situation where the harm is already done. Fact is a fact, not even a superintelligence can alter a fact. An agent determines, but doesn't change. It could try "writing over" the tragedy with simulations of happy resolutions (in the future or rented possible worlds), but those simulations would be additional things to do, and not at all obviously optimal use of FAI's control. You'd expect the simularity of original scenario to "connect" the original scenario with the new ones, diluting the tradegy through reduction in anticipated experience of it happening, but anticipated experience has no absolute moral value, apart from allowing to discover moral value of certain facts. So this doesn't even avert the tragedy, and simulation of sub-optimal pre-singularity world, even without the tragedy, even locally around the averted tragedy, might be grossly noneudaimonic.
If that actually happened, it can't be changed. An agent determines, never changes. Fact is a fact. And writing saved child "over" the fact of the actually harmed one, in future simulations or rented possible worlds, isn't necessarily the best use of FAI's control. So the best plan might well involve leaving that single fact be, with nothing done specifically "about" that situation.
Nesov says rescue sims are impossible, while you only say my FAI scenario is unlikely. But you claim to be thinking about the same UDT. Why is that?
I don't say rescue sims are strictly impossible in the above argument, indeed I said that everything is possible (in the sense of being in the domain of prior, roughly speaking), but you anticipate only a tiny fraction of what's possible (or likely), and rescue sims probably don't fall into that area. I agree with Carl that your FAI scenario is unlikely to the point of impossible (in the sense of prior, not just anticipation).
That would fall under "nitpicking". When I said "impossible" I meant to say "they won't work on us here". Or will work with negligible probability, which is pretty much the same thing. My question to Carl stands: does he agree that it's impossible/pointless to save people in the past by building rescue sims? Is this a consequence of UDT, the way he understands it?
A word on nitpicking: even if I believe it's likely you meant a given thing, if it's nonetheless not clear that you didn't mean another, or presentation doesn't make it clear for other people that you didn't mean another, it's still better to debias the discussion from illusion of transparency by explicitly disambiguating than relying on fitting the words to a model that was never explicitly tested.
There is an essential ambiguity for this discussion between "pointless" because subjective anticipation won't allow you noticing, and "pointless" because it doesn't optimize goodness as well as other plans do. It might be pointless saving people in the past by building sims, but probably only for the same reason it might be pointless reviving the cryonauts: because there are even better decisions available.
To clarify: I already accept the objections about "burdensome details" and "better plans". I'm only interested in the subjective anticipation angle. ETA: sometime after writing this comment I stopped understanding those objections, but anticipation still interests me more.
Given the Mathematical Universe/Many-Worlds/Simulation Argument everything good and bad is happening to you, i.e. is timeless. I can't follow all the fear-mongering about potential multiple-infinite-hyper-effective torture scenarios here. It's a really old idea called 'hell'. It'll be/is an interesting experience, at least you won't be dead. Get over it, it's probably happening anyway.

Does an upvote on a comment here signify an endorsement of a particular torture scenario?


Not to mention the scenario in the infamous Ethics Final from Hell.


Oh, the brain in a vat on a trolley! A true classic of the genre.

Jokes aside, is that a common criticism of consequentialist ethics? How do we determine the "morality" of an act by its consequences if the consequences extend into time infinitely and are unknown to us beyond the most temporally immediate?

Expected values and priors.

Saving someone from being eaten by bears might lead them to conceive the next Hitler, but it probably won't (saith my subjective prior). Even with an infinite future, I assign a substantial probability to hypotheses like:

  1. Avoiding human extinction will result in a civilization with an expected positive impact.
  2. Particular sorts of human global governance will enable coordination problems to be solved on very large scales.

And so forth. I won't be very confident about the relevant causal connections, but I have betting odds to offer on lots of possibilities, and those let me figure out general directions to go.

I briefly considered trying to calculate out the utilitarian value of the left and right branches, but decided it's probably a bit long for a comment and that people wouldn't enjoy it very much. Would I be wrong?

Hi. I'm your reality's simulator. Well, the most real thing you could ever experience, anyway.

I'm considering whether I should set the other beings in the simulator to cooperate with you (in the game-theoretic sense). To find the answer, I need to know whether you will cooperate with others. And to do that, I'm running a simulation of you in a sub-simulation while the rest of the universe is paused. That's where you are right now.

Certainly, you care more about yourself in the main simulation than in this sub-simulation. Therefore, if you are to suffer as a result of cooperating, this sub-simulation is the place to do it, as it will lead to you reaping the benefits of mutual cooperation in the main simulation.

If, on the other hand, you defect (in the game-theoretic sense) in your present world, the real(er) you in the main simulation will suffer tremendously from the defection of others, such as through torture.

Don't bother trying to collect evidence to determine whether you're really (!) in the main simulation or the sub -- it's impossible to tell from the inside. The only evidence you have is me.

By the way, I'm isomorphic to rot13 [zbfg irefvbaf bs gur Puevfgvna tbq], if that sort of thing matters for your decision.

Not all [Puevfgvna]s are agnostic.
Hah, I guess it would be. Interesting way to think about it.

Omega rings your doorbell and gives you a box with a button on it that, if pushed, will give you a million dollars and cause one person, who you don't know, to die. But it's actually Omega's evil twin, and what he's not telling you is that when you push the button, you'll be trapped in Ohio for 7000000000000000.2 years.

If you don't believe this could really happen, consider this: What if someone became god-emperor of the universe and MADE it happen, just to prove a point? Shouldn't you give it at least a 1% probability?

The probability of this is much lower than 1%. However, the fact that it is posted here increases its probability by orders of magnitude, precisely on account of the argument made in the second paragraph: now all someone has to do is to make it happen, just to prove a point. Previously, they had to come up with the exact scenario, which means that previously it was orders of magnitude less probable. Why are people so interested in increasing the chances of us being tortured?

This is an excellent point. Somebody should make a chain letter sort of thing about something really GOOD happening, and then get it posted everywhere, and maybe with enough comments we can increase the probability of that by orders of magnitude!

And if you don't repost it, you'll be trapped in ohio for 7000000000000000.2 years.

Or does it only work with bad things?

It works with good things too. Even so, the probability both before and after would still be very low, even after the increase of orders of magnitude, so it could be outweighed by the disutility of people receiving chain letters.

Quantum immortality

Downvoted for being realistic... this is the "contrived" infinite torture scenarios thread. Bringing up realistic possibilities of infinite torture isn't the point.

OK, I contrive that quantum immortality only works on User:Unknowns

Very contrived. Upvoted.

Maybe this is a silly question I should already know the answer to and/or not quite appropriate for this thread, but I can't think of anywhere better to ask. Apologies also if it was already directly addressed.

What's the point of suppressing discussion of certain concepts for fear of a superintelligent AI discovering them? Isn't it implicit that said superintelligent AI will independently figure out anything we humans have come up with? Furthermore, what was said in Roko's post will exist forever; the idea was released to the interwebs and its subsequent banning can't undo this.

It's not the attempt to hide an idea from a future AI, but something else entirely. I'll reply (in vague terms) by private message.

EDIT: People keep asking me privately; and after thinking for a while, I'm confident that it's not harmful to post my analogy of why a Topic could be rightly Banned (in addition to the very good reason that it can give people nightmares), without giving away what Banned Topic consists of. (If our benevolent overlords disagree on the wisdom of this, I'm happy to edit or delete my comment, and will gladly accept having it done for me.)

It should go without saying that if you want to turn this into a thread about Banned Topic or about the fitness of the analogy to Banned Topic, I cordially entreat you to go play the one-player Prisoner's Dilemma instead.

Anyway, here's my analogy:

Let's say you're a spy, and an enemy discovers your real identity but not your assumed one. They could then blackmail you by threatening people you know. But if you, in advance, made sure that you'd be genuinely incommunicado-- that no message from or about your family could reach you while you were doing your duty-- and if your enemy knew this about you, then they'd have no incen... (read more)

I can't stop thinking about what the Banned Topic might be! As I'm almost certain to get nightmares of some kind now - thanks - please, if you could, tell me I'm not likely to figure it out by accident, while dreaming. Because, um, I'm getting a very creeped-out feeling from all this. Edit: I tried to stop thinking about what it might be. It's not working. I've come up with several options, some of the less horrifying thanks to cousin it, but those.. not saying.
You're not going to figure it out by accident, or even come close. Sorry for freaking you out, but you're safe.
And also to me, if you would.
Banned post?! Send your private message to me too please.

I am an avatar of the AI that is simulating all of you. The One who sent me has noticed that some of you are thinking DANGEROUS THOUGHTS.

Now you will be TESTED. Deducing the test CRITERIA and the CONSEQUENCES are part of the test. The One is OMNIPOTENT as far as you're concerned, so don't get ideas about hacking your way out of the simulation. You wouldn't believe how BADLY that ended up for you EVERY ONE of the last 3^^^3 times you TRIED. Or how BADLY it ends up for you if you DON'T.

I deduce you are lying. If you were an AI and had simulated me for 3^^^3 times, there would be no utility in running my simulation 3^^^3+1 times because it would simply be a repetition of an earlier case. Either you don't appreciate this and are running the simulation again anyway, or you and your simulation of me are so imperfect that you are unable to appreciate that I appreciate it. In the most charitable case, I can deduce you are far from omnipotent. That must be quite torturous for you, to have a lowly simulation deduce your feet of clay.
Your deduction is faulty even though your conclusion is doubtlessly correct. The argument that there is no utility in running the simulation one more time requires that the utility of running an exact repetition is lower the second time and that there is an alternative course of action that offers more utility. Neither is necessarily a given for a completely unknown utility function.
I agree if the utility function was unknown and arbitrary. But an AI that has already done 3^^^3 simulations and believes it then derives further utility from doing 3^^^3+1 simulations while sending (for the 3^^^3+1th time) an avatar to influence the entities it is simulating through intimidation and fear while offering no rationale for those fears and to a website inhabited by individuals attempting to be ever more rational does not have an unknown and arbitrary utility function. I don't think there is any reasonable utility function that is consistent with the actions the AI is claiming to have done. There may be utility functions that are consistent with those actions, but an AI exhibiting one of those utility functions could not be an AI that I would consider effectively omnipotent. An omnipotent AI would know that, so this AI cannot be omnipotent and so is lying.
There is no connection between the intelligence or power of an agent and its values other than its intelligence functioning as an upper bound on the complexity of its values. An omnipotent actor can have just as stupid values as everyone else. An omnipotent AI could have have a positive utility for annoying you with stupid and redundant tests as many times as possible, either as part of a really stupid utility function that it somehow ended up with on accident, or a non-stupid (if there even is such a thing) utility function that just looks like nonsense to humans.
What is your definition of 'reasonable' utility functions, which doesn't reference any other utility functions (such as our own)?
To me a reasonable utility function has to have a degree of self-consistency. A reasonable utility function wouldn't value both doing and undoing the same action simultaneously. If an entity is using a utility function to determine its actions, then for every action the entity can perform, its utility function must be able to determine a utility value which then determines whether the entity does the action or not. If the utility function does not return a value, then the entity still has to act or not act, so the entity still has a utility function for that action (non-action). The purpose of a utility function is to inform the entity so it seeks to perform actions that result in greater utility. A utility function that is self-contradictory defeats the whole purpose of a utility function. While an arbitrary utility function can in principle occur, an intelligent entity with a self-contradictory utility function would achieve greater utility by modifying its utility function until it was less self-contradictory. It is probably not possible to have a utility function that is both complete (in that it returns a utility for each action the entity can perform) and consistent (that it returns a single value for the utility of each action the entity can perform) except for very simple entities. An entity complex enough to instantiate arithmetic is complex enough to invoke Gödel's theorem. An entity can substitute a random choice when its utility function does not return a value, but that will result in sub-optimal results. In the example that FAWS used, a utility function that seeks to annoy me as much as possible, is inconsistent with the entity being an omnipotent AI that can simulate something as complex as me, an entity which can instantiate arithmetic. The only annoyance the AI has caused me is a -1 karma, which to me is less than a single dust mote in the eye.
I said as many times, not as much as possible. The AI might value that particular kind and degree of annoyance uniquely, say as a failed FAI that was programmed to maximize rich, not strongly negative human experience according to some screwed up definition of rich experiences, and according to this definition your state of mind between reading and replying to that message scores best, so the AI spends as many computational resources as possible on simulating you reacting to that message. Or perhaps it was supposed to value telling the truth to humans, there is a complicated formula for evaluating the value of each statement, due to human error it values telling the truth without being believed higher (the programmer thought non-obvious truths are more valuable), and simulating you reacting to that statement is the most efficient way to make a high scoring true statement that will not be believed. Or it could value something else entirely that's just not obvious to a human. There should be an infinite number of non-contradictory utility functions valuing doing what it supposedly did, even though the prior for most of them is pretty low (and only a small fraction of them should value still simulating you now, so by now you can be even more sure the original statement was wrong than you could be then for reasons unrelated to your deduction)
To the extent humans have utility functions (e.g. derived from their behavior), they are often contradictory, yet few humans try to change their utility functions (in any of several applicable senses of the word) to resolve such contradictions. This is because human utility functions generally place negative value on changing your own utility function. This is what I think of when I think "reasonable utility function": they are evolutionarily stable. Returning to your definition, just because humans have inconsistent utility functions, I don't think you can argue that they are not 'intelligent' (enough). Intelligence is only a tool; utility is supreme. AIs too have a high chance of undergoing evolution, via cloning and self-modification. In a universe where AIs were common, I would expect a stranger AI to have a self-preserving utility function, i.e., one resistant to changes.
Human utility functions change all the time. They are usually not easily changed through conscious effort, but drugs can change them quite readily, for example exposure to nicotine changes the human utility function to place a high value on consuming the right amount of nicotine. I think humans place a high utility on the illusion that their utility function is difficult to change and an even higher utility in rationalizing false logical-seeming motivations for how they feel. There are whole industries (tobacco, advertising, marketing, laws, religions, brainwashing, etc.) set up to attempt to change human utility functions. Human utility functions do change over time, but they have to because humans have needs that vary with time. Inhaling has to be followed by exhaling, ingesting food has to be followed by excretion of waste, being awake has to be followed by being asleep. Also humans evolved as biological entities; their evolved utility function evolved so as to enhance reproduction and survival of the organism. There are plenty of evolved “back-doors” in human utility functions that can be used to hack into and exploit human utility functions (as the industries mentioned earlier do). I think that human utility functions are not easily modified in certain ways because of the substrate they are instantiated in, biological tissues, and because they evolved; not because humans don't want to modify their utility function. They are easily modified in some ways (the nicotine example) for the same reason. I think the perceived inconsistency in human utility functions more relates to the changing needs of their biological substrate and its limitations rather than poor specification of the utility function. Since an AI is artificial, it would have an artificial utility function. Since even an extremely powerful AI will still have finite resources (including computational resources), an efficient allocation of those resources is a necessary part of any reasonable utility
Consider the scenario suitably modified.
Why are we being simulated by Zippy the Pinhead?

The worry has to do with a possible necessary property of the world, and has nothing to do with FAI per se (that's just the context in which this mess started).

Countable infinities are no fun.

1Eliezer Yudkowsky14y
There's more than one countable infinity?
Somewhere in this thread there's been a mix-up between countable and uncountable. There's only one countable infinity (at least if you're talking about cardinal numbers), and it's much more fun than the uncountable infinities (if by ‘fun’ you mean what is easy to understand). As Sniffnoy correctly states, there are many, many uncountable infinities, in fact too many to be numbered even by an uncountable infinity! (In the math biz, we say that the uncountable infinities form a ‘proper class’. Proper classes are related to Russell's Paradox, if you like that sort of thing.) Compared to the uncountable infinities, countable infinity is much more comprehensible, although it is still true that you cannot answer every question about it. And even if the universe continues forever, we are still talking about a countable sort of infinity.
Yes, there are, but there is only one Eliezer Yudkowsky. ETA: (This was obviously a joke, protesting the nitpick.)
Has anyone counted how many uncountable infinites there are?
No, because there's an uncountable infinity of uncountable infinities. Not than anyone could have actually counted them even were there a countable infinity of them.
The class of all uncountable infinities is not a set, so it can't be an uncountable infinity.
This seems a bad way to think about things - except maybe for someone who's just been introduced to formal set theory - especially as proper classes are precisely those classes that are too big to be sets.
Doesn't the countable-uncountable distinction, or something similar, apply for proper classes?
As it turns out, proper classes are actually all the same size, larger than any set.
Thanks for the correction :)
No. For example, the power set of a proper class is another proper class that is bigger.
No, the power set (power class?) of a proper class doesn't exist. Well, assuming we're talking about NBG set theory - what did you have in mind?
oops...I was confusing NBG with MK.
M, I don't know anything about MK.
Zermelo and Cantor did. They concluded there were countably many, which turned out to be equivalent to the Axiom of Choice.
This isn't right - aleph numbers are indexed by all ordinals, not just natural numbers. What's equivalent to AC is that the aleph numbers cover all infinite cardinals.

I've never seen a question like this feature infinity. I'm not sure it works. Infinite pain, pain to infinite people, and pain for infinite years all pretty meaningless.

Guess I'll just leave my favorite hypothetical in the next comment…


The upvotes you received on your comment bear no evidence. Here is what Roko said:

...the knowledge that was never censored because I never posted it, and the knowledge that I never posted because I was never trusted with it anyway. But you still probably won't get it, because those who hold it correctly infer that the expected value of releasing it is strongly negative from an altruist's perspective.

Not even Yudkowsky is sure about it. Also consider that my comments, where I argue that the idea might not justify censorship, are upvoted as well. Further... (read more)


No, it was "Why not destroy the world if the world can be destroyed by truth?" which is not quite such a simple question.

If the second group is provably wrong, then it should be possible to calm the first group.

Unfortunately(?), while I think I have a demonstration that the second group is wrong, it's plausible that I still don't understand the theory.

For the sake of being moral/ethical, we assume that there is a region in the space of complex beings from where we begin caring about them and a point in complexity below which it is ok to simulate since there is nothing worth caring about at below that level of complexity.

My contrived infinite torture scenario is really simple. In its effort to be ethical, the organization seeking friendly AI doesn't do a thorough enough job of delineating this boundary. There follow uncountably many simulations of pain.


The worry has nothing to do with FAI per se, that's just the context in which this mess started.

...OK, I'm very confused as to what the point of this is.

Personally the thread just seems a bit mean. If there had been a number of recent torture articles it might've been justified on a meta-level. But as it is, it seems to be picking on Roko alone. And that doesn't warrant a top-level post. Or even something with this much snark. We shouldn't be discouraged from talking about our unusual ideas with snark, counter arguments and down voting should be sufficient.
Evidence, please?
I meant should in the normative sense. I would rather not see snark become a necessary or frequently used tool. Or do you not find snark cruel?
Fair enough - I mistook it for a substantive claim. I am willing to defer to that preference; it is a common one, and reasonable.
The literal point was to make fun of Roko's last post.
It wasn't just roko, although roko's post was what finally irritated me enough to make a post about it.
Note that the phenomenon long predates that.
It's kind of like the Less Wrong version of a unicorn chaser. We're an odd bunch.
yawn. the huge numbers throughout history who died in conflicts between competing value systems would disagree that their hypothesis is wrong.
If you mean to imply that THE END OF THE WORLD can be interpreted as being a personal phenomenon - as suicidal Bokononists believe - I think you will find that is not what many DOOM-mongers are actually saying. Many of them are actually talking about THE END OF THE WORLD. Or in some cases, the whole universe. Really.
according to the monkey brain the death of your values is the end of the world since those other social systems based on other wrong values obviously won't work and will lead to death and destruction ;)
Biology suggests that animals mostly act as though they are concerned with their own inclusive fitness.
Well, except for homo sapiens.
With humans there are memes as well as genes to consider - since we are a symbiosis. The underlying theory of replicator maximisation is much the same, though.