I don't like to be a bearer of bad news here, but it ought to be stated. This whole leverage ratio idea is very obviously an intelligent kludge / patch / work around because you have two base level theories that either don't work together or don't work individually.
You already know that something doesn't work. That's what the original post was about and that's what this post tries to address. But this is a clunky inelegant patch, that's fine for a project or a website, but given belief in the rest of your writings on AI, this is high stakes. At those stakes saying "we know it doesn't work, but we patched the bugs we found" is not acceptable.
The combination of your best guess at picking the rigtht decision theory and your best guess at epistemology produces absurd conclusions. Note that you allready know this. This knowledge which you already have motivated this post.
The next step is to identify which is wrong, the decision theory or the epistemology. After that you need to find something that's not wrong to replace it. That sucks, it's probably extreamly hard, and it probably sets you back to square one on multiple points. But you can't know that one of your foundations is wrong and just keep going. Once you know you are wrong you need to act consistently with that.
shminux,
It's just a fact that you endorse a very different theory of "reality" than Eliezer. Why disguise your reasonable disagreement with him by claiming that you don't understand him?
You talk like you don't notice when highly-qualified-physicist shminux is talking and when average-armchair-philosopher shminux is talking.
Which is annoying to me in particular because physicist shminux knows a lot more than I, and I should pay attention to what he says in order to be less wrong, while philosopher shminux is not entitled to the same weight. So I'd like some markers of which one is talking.
I thought I was pretty clear re the "markers of which one is talking". But let me recap.
Eliezer has thought about metaethics, decision theories and AI design for much much longer time and much much more seriously than I have. I can see that when I read what he writes about the issues I have not even thought of. While I cannot tell if it is correct, I can certainly tell that there is a fair amount of learning I still have to do if I wanted to be interesting. This is the same feeling I used to get (and still get on occasion) when talking with an expert in, say, General Relativity, before I learned the subject in sufficient depth. Now that I have some expertise in the area, I see the situation from the other side, as well. I can often recognize a standard amateurish argument before the person making it has finished. I often know exactly what implicit false premises lead to this argument, because I had been there myself. If I am lucky, I can successfully point out the problematic assumptions to the amateur in question, provided I can simplify it to the proper level. If so, the reaction I get is "that's so cool... so deep... I'll go and ponder it, Thank you, Master!"...
Even if the field X is confused, to confidently dismiss subtheory Y you must know something confidently about Y from within this confusion, such as that Y is inconsistent or nonreductionist or something. I often occupy this mental state myself but I'm aware that it's 'arrogant' and setting myself above everyone in field X who does think Y is plausible - for example, I am arrogant with respect to respected but elderly physicists who think single-world interpretations of QM are plausible, or anyone who thinks our confusion about the ultimate nature of reality can keep the God subtheory in the running. Our admitted confusion does not permit that particular answer to remain plausible.
I don't think anyone I take seriously would deny that the field of anthropics / magical-reality-fluid is confused. What do you think you know about all computable processes, or all logical theories with models, existing, which makes that obviously impermitted? In case it's not clear, I wasn't endorsing Tegmark Level IV as the obvious truth the way I consider MWI obvious, nor yet endorsing it at all, rather I was pointing out that with some further specification a version of T4 could provide a model in ...
Maybe that's why I can't see the relevance of an untestable theory to AI design.
It seems to be the problem that is relevant to AI design. How does an expected utility maximising agent handle edge cases and infinitesimals given logical uncertainty and bounded capabilities? If you get that wrong then Rocks Fall and Everyone Dies. The relevance of any given theory of how such things can be modelled is then based on either suitability for use in an AI design (or conceivably the implications if an AI constructed and used said model).
Mugger: Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers.
Me: I'm not sure about that.
Mugger: So then, you think the probability I'm telling the truth is on the order of 1/3↑↑↑3?
Me: Actually no. I'm just not sure I care as much about your 3↑↑↑3 simulated people as much as you think I do.
Mugger: "This should be good."
Me: There's only something like n=10^10 neurons in a human brain, and the number of possible states of a human brain exponential in n. This is stupidly tiny compared to 3↑↑↑3, so most of the lives you're saving will be heavily duplicated. I'm not really sure that I care about duplicates that much.
Mugger: Well I didn't say they would all be humans. Haven't you read enough Sci-Fi to know that you should care about all possible sentient life?
Me: Of course. But the same sort of reasoning implies that, either there are a lot of duplicates, or else most of the people you are talking about are incomprehensibly large, since there aren't that many small Turing machines to go around. And it's not at all obvious to me that you can describe arbitrarily large minds whose existence I should care about without using up a lot of complexity. More generally, I can't see any way to describe worlds which I care about to a degree that vastly outgrows their complexity. My values are complicated.
I'm not really sure that I care about duplicates that much.
Bostrom would probably try to argue that you do. See Bostrom (2006).
I think it not unlikely that if we have a successful intelligence explosion and subsequently discover a way to build something 4^^^^4-sized, then we will figure out a way to grow into it, one step at a time. This 4^^^^4-sized supertranshuman mind then should be able to discriminate "interesting" from "boring" 3^^^3-sized things. If you could convince the 4^^^^4-sized thing to write down a list of all nonboring 3^^^3-sized things in its spare time, then you would have a formal way to say what an "interesting 3^^^3-sized thing" is, with description length (the description length of humanity = the description length of our actual universe) + (the additional description length to give humanity access to a 4^^^^4-sized computer -- which isn't much because access to a universal Turing machine would do the job and more).
Thus, I don't think that it needs a 3^^^3-sized description length to pick out interesting 3^^^3-sized minds.
This post has not at all misunderstood my suggestion from long ago, though I don't think I thought about it very much at the time. I agree with the thrust of the post that a leverage factor seems to deal with the basic problem, though of course I'm also somewhat expecting more scenarios to be proposed to upset the apparent resolution soon.
Hm, a linear "leverage penalty" sounds an awful lot like adding the complexity of locating you of the pool of possibilities to the total complexity.
Thing 2: consider the case of the other people on that street when the Pascal's Muggle-ing happens. Suppose they could overhear what is being said. Since they have no leverage of their own, are they free to assign a high probability to the muggle helping 3^^^3 people? Do a few of them start forward to interfere, only to be held back by the cooler heads who realize that all who interfere will suddenly have the probability of success reduced by a factor of 3^^^3?
Suppose we had a planet of 3^^^3 people (their universe has novel physical laws). There is a planet-wide lottery. Catherine wins. There was a 1/3^^^3 chance of this happening. The lotto representative comes up to her and asks her to hand over her ID card for verification.
All over the planet, as a fun prank, a small proportion of people have been dressing up as lotto representatives and running away with peoples' ID cards. This is very rare - only one person in 3^^3 does this today.
If the lottery prize is 3^^3 times better than getting your ID card stolen, should Catherine trust the lotto official? No, because there are 3^^^3/3^^3 pranksters, and only 1 real official, and 3^^^3/3^^3 is 3^^(3^^3 - 3), which is a whole lot of pranksters. She hangs on to her card, and doesn't get the prize. Maybe if the reward were 3^^^3 times greater than the penalty, we could finally get some lottery winners to actually collect their winnings.
All of which is to say, I don't think there's any locational penalty - the crowd near the muggle should have exactly the same probability assignments as her, just as the crowd near Catherine has the same probability assignments as her about whether this i...
A simplified version of the argument here:
My response to this is that the probability distribution is even less up for grabs. The utility, at least, is explicitly there to reflect our preferences. If we see that a utility function is causing our agent to take the wrong actions, then it makes sense to change it to better reflect the actions we wish our agent to take.
The probability distribution, on the other hand, is a map that should reflect the territory as well as possible! It should not be modified on account of badly-behaved utility computations.
This may be taken as an argument in favor of modifying the utility function; Sniffnoy makes a case for bounded utility in another comment.
It could alternatively be taken as a case for modifying the decision procedure. Perhaps neither the probability nor the utility are "up for grabs", but how we use them should be modified.
One (somewhat ...
I have a problem with calling this a "semi-open FAI problem", because even if Eliezer's proposed solution turns out to be correct, it's still a wide open problem to develop arguments that can allow us to be confident enough in it to incorporate it into an FAI design. This would be true even if nobody can see any holes in it or have any better ideas, and doubly true given that some FAI researchers consider a different approach (which assumes that there is no such thing as "reality-fluid", that everything in the multiverse just exists and as a matter of preference we do not / can not care about all parts of it in equal measure, #4 in this post) to be at least as plausible as Eliezer's current approach.
(As always, the term "magical reality fluid" reflects an attempt to demarcate a philosophical area where I feel quite confused, and try to use correspondingly blatantly wrong terminology so that I do not mistake my reasoning about my confusion for a solution.)
This seems like a really useful strategy!
Agreed - placeholders and kludges should look like placeholders and kludges. I became a happier programmer when I realised this, because up until then I was always conflicted about how much time I should spend making some unsatisfying piece of code look beautiful.
Just thought of something:
How sure are we that P(there are N people) is not at least as small as 1/N for sufficiently large N, even without a leverage penalty? The OP seems to be arguing that the complexity penalty on the prior is insufficient to generate this low probability, since it doesn't take much additional complexity to generate scenarios with arbitrarily more people. Yet it seems to me that after some sufficiently large number, P(there are N people) must drop faster than 1/N. This is because our prior must be normalized. That is:
Sum(all non-negative integers N) of P(there are N people) = 1.
If there was some integer M such that for all n > M, P(there are n people) >= 1/n, the above sum would not converge. If we are to have a normalized prior, there must be a faster-than-1/N falloff to the function P(there are N people).
In fact, if one demands that my priors indicate that my expected average number of people in the universe/multiverse is finite, then my priors must diminish faster than 1/N^2. (So that that the sum of N*P(there are N people) converges).
TL:DR If your priors are such that the probability of there being 3^^^3 people is not smaller than 1/(3^^^3), then you don't have a normalized distribution of priors. If your priors are such that the probability of there being 3^^^3 people is not smaller than 1/((3^^^3)^2) then your expected number of people in the multiverse is divergent/infinite.
How does this style of reasoning work on something more like the original Pascal's Wager problem?
Suppose a (to all appearances) perfectly ordinary person goes on TV and says "I am an avatar of the Dark Lords of the Matrix. Please send me $5. When I shut down the simulation in a few months, I will subject those who send me the money to [LARGE NUMBER] years of happiness, and those who do not to [LARGE NUMBER] years of pain".
Here you can't solve the problem by pointing out the very large numbers of people involved, because there aren't very high numbers of people involved. Your probability should depend only on your probability that this is a simulation, your probability that the simulators would make a weird request like this, and your probability that this person's specific weird request is likely to be it. None of these numbers help you get down to a 1/[LARGE NUMBER] level.
I've avoided saying 3^^^3, because maybe there's some fundamental constraint on computing power that makes it impossible for simulators to simulate 3^^^3 years of happiness in any amount of time they might conceivably be willing to dedicate to the problem. But they might be able to simulate some number of years large enough to outweigh our prior against any given weird request coming from the Dark Lords of the Matrix.
(also, it seems less than 3^^^3-level certain that there's no clever trick to get effectively infinite computing power or effectively infinite computing time, like the substrateless computation in Permutation City)
Imagine someone makes the following claims:
Then they threaten, unless you give them $5, to kidnap you, give you the immortality drug, stick you in the spaceship, launch it at near-light speed, and have you stuck (presumably bound in an uncomfortable position) in the spaceship for the 3^^^3 years the universe will last.
(okay, there are lots of contingent features of the universe that will make this not work, but imagine something better. Pocket dimension, maybe?)
If their claims are true, then their threat seems credible even though it involves a large amount of suffering. Can you explain what you mean by life-centuries being instantiated by causal nodes, and how that makes the madman's threat less credible?
This is an awful lot of words to expend to notice that
(1) Social interactions need to be modeled in a game-theoretic setting, not straightforward expected payoff
(2) Distributions of expected values matter. (Hint: p(N) = 1/N is a really bad model as it doesn't converge).
(3) Utility functions are neither linear nor symmetric. (Hint: extinction is not symmetric with doubling the population.)
(4) We don't actually have an agreed-upon utility function anyway; big numbers plus a not-well-agreed-on fuzzy notion is a great way to produce counterintuitive results. The details don't really matter; as fuzzy approaches infinity, you get nonintuitiveness.
It's much more valuable to address some of these imperfections in the setup of the problem than continuing to wade through the logic with bad assumptions in hand.
Friendly neighborhood Matrix Lord checking in!
I'd like to apologize for the behavior of my friend in the hypothetical. He likes to make illusory promises. You should realize that regardless of what he may tell you, his choice of whether to hit the green button is independent of your choice of what to do with your $5. He may hit the green button and save 3↑↑↑3 lives, or he may not, at his whim. Your $5 can not be reliably expected to influence his decision in any way you can predict.
You are no doubt accustomed to thinking about enforceable contracts between parties, since those are a staple of your game theoretic literature as well as your storytelling traditions. Often, your literature omits the requisite preconditions for a binding contract since they are implicit or taken for granted in typical cases. Matrix Lords are highly atypical counterparties, however, and it would be a mistake to carry over those assumptions merely because his statements resemble the syntactic form of an offer between humans.
Did my Matrix Lord friend (who you just met a few minutes ago!) volunteer to have his green save-the-multitudes button and your $5 placed under the control of a mutually trustworthy th...
I don't at all think that this is central to the problem, but I do think you're equating "bits" of sensory data with "bits" of evidence far too easily. There is no law of probability theory that forbids you from assigning probability 1/3^^^3 to the next bit in your input stream being a zero -- so as far as probability theory is concerned, there is nothing wrong with receiving only one input bit and as a result ending up believing a hypothesis that you assigned probability 1/3^^^3 before.
Similarly, probability theory allows you to assign prior probability 1/3^^^3 to seeing the blue hole in the sky, and therefore believing the mugger after seeing it happen anyway. This may not be a good thing to do on other principles, but probability theory does not forbid it. ETA: In particular, if you feel between a rock and a bad place in terms of possible solutions to Pascal's Muggle, then you can at least consider assigning probabilities this way even if it doesn't normally seem like a good idea.
Two quick thoughts:
Any two theories can be made compatible if allowing for some additional correction factor (e.g. a "leverage penalty") designed to make them compatible. As such, all the work rests with "is the leverage penalty justified?"
For said justification, there has to some sort of justifiable territory-level reasoning, including "does it carve reality at its joints?" and such, "is this the world we live in?".
The problem I see with the leverage penalty is that there is no Bayesian updating way that will get you to such a low prior. It's the mirror from "can never process enough bits to get away from such a low prior", namely "can never process enough bits to get to assigning such low priors" (the blade cuts both ways).
The reason for that is in part that your entire level of confidence you have in the governing laws of physics, and the causal structure and dependency graphs and such is predicated on the sensory bitstream of your previous life - no more, it's a strictly upper bound. You can gain confidence that a prior to affect a googleplex people is that low only by using that lifetime bitstream you have accu...
As near as I can figure, the corresponding state of affairs to a complexity+leverage prior improbability would be a Tegmark Level IV multiverse in which each reality got an amount of magical-reality-fluid corresponding to the complexity of its program (1/2 to the power of its Kolmogorov complexity) and then this magical-reality-fluid had to be divided among all the causal elements within that universe - if you contain 3↑↑↑3 causal nodes, then each node can only get 1/3↑↑↑3 of the total realness of that universe.
This reminds me a lot of Levin's universal...
You probably shouldn't let super-exponentials into your probability assignments, but you also shouldn't let super-exponentials into the range of your utility function. I'm really not a fan of having a discontinuous bound anywhere, but I think it's important to acknowledge that when you throw a trip-up (^^^) into the mix, important assumptions start breaking down all over the place. The VNM independence assumption no longer looks convincing, or straightforward. Normally my preferences in a Tegmark-style multiverse would reflect a linear combination of my pr...
Is it just me, or is everyone here overly concerned with coming up with patches for this specific case and not the more general problem? If utilities can grow vastly larger than the prior probability of the situation that contains them, then an expected utility system will become almost useless. Acting on situations with probabilities as tiny as can possibly be represented in that system, since the math would vastly outweigh the expected utility from acting on anything else.
I've heard people come up with apparent resolutions to this problem. Like counter b...
Nick Beckstead's finished but as-yet unpublished dissertation has much to say on this topic. Here is Beckstead's summary of chapters 6 and 7 of his dissertation:
...[My argument for the overwhelming importance of shaping the far future] asks us to be happy with having a very small probability of averting an existential catastrophe [or bringing about some other large, positive "trajectory change"], on the grounds that the expected value of doing so is extremely enormous, even though there are more conventional ways of doing good which have a high pr
If an AI's overall architecture is such as to enable it to carry out the "You turned into a cat" effect - where if the AI actually ends up with strong evidence for a scenario it assigned super-exponential improbability, the AI reconsiders its priors and the apparent strength of evidence rather than executing a blind Bayesian update, though this part is formally a tad underspecified - then at the moment I can't think of anything else to add in.
Ex ante, when the AI assigns infinitesimal probability to the real thing, and meaningful probability t...
Just gonna jot down some thoughts here. First a layout of the problem.
I think the simpler solution is just to use a bounded utility function. There are several things suggesting we do this, and I really don't see any reason to not do so, instead of going through contortions to make unbounded utility work.
Consider the paper of Peter de Blanc that you link -- it doesn't say a computable utility function won't have convergent utilities, but rather that it will iff said function is bounded. (At least, in the restricted context defined there, though it seems fairly general.) You could try to escape the conditions of the theore...
I get the sense you're starting from the position that rejecting the Mugging is correct, and then looking for reasons to support that predetermined conclusion. Doesn't this attitude seem dangerous? I mean, in the hypothetical world where accepting the Mugging is actually the right thing to do, wouldn't this sort of analysis reject it anyway? (This is a feature of debates about Pascal's Mugging in general, not just this post in particular.)
It seems to me like the whistler is saying that the probability of saving knuth people for $5 is exactly 1/knuth after updating for the Matrix Lord's claim, not before the claim, which seems surprising. Also, it's not clear that we need to make an FAI resistant to very very unlikely scenarios.
I'm a lot more worried about making an FAI behave correctly if it encounters a scenario which we thought was very very unlikely.
I enjoyed this really a lot, and while I don't have anything insightful to add, I gave five bucks to MIRI to encourage more of this sort of thing.
(By "this sort of thing" I mean detailed descriptions of the actual problems you are working on as regards FAI research. I gather that you consider a lot of it too dangerous to describe in public, but then I don't get to enjoy reading about it. So I would like to encourage you sharing some of the fun problems sometimes. This one was fun.)
If the AI actually ends up with strong evidence for a scenario it assigned super-exponential improbability, the AI reconsiders its priors and the apparent strength of evidence rather than executing a blind Bayesian update, though this part is formally a tad underspecified.
I would love to have a conversation about this. Is the "tad" here hyperbole or do you actually have something mostly worked out that you just don't want to post? On a first reading (and admittedly without much serious thought -- it's been a long day), it seems to me that this...
If someone suggests to me that they have the ability to save 3^^^3 lives, and I assign this a 1/3^^^3 probability, and then they open a gap in the sky at billions to one odds, I would conclude that it is still extremely unlikely that they can save 3^^^3 lives. However, it is possible that their original statement is false and yet it would be worth giving them five dollars because they would save a billion lives. Of course, this would require further assumptions on whether people are likely to do things that they have not said they would do, but are weake...
1) It's been applied to cryonic preservation, fer crying out loud. It's reasonable to suspect that the probability of that working is low, but anyone who says with current evidence that the probability is beyond astronomically low is being too silly to take seriously.
Has the following reply to Pascal's Mugging been discussed on LessWrong?
One point I don't see mentioned here that may be important is that someone is saying this to you.
I encounter lots of people. Each of them has lots of thoughts. Most of those thoughts, they do not express to me (for which I am grateful). How do they decide which thoughts to express? To a first approximation, they express thoughts which are likely, important and/or amusing. Therefore, when I hear a thought that is highly important or amusing, I expect it had less of a likelihood barrier to being expressed, and assign it a proportionally lower probability.
Note that this doesn't apply to arguments in general -- only to ones that other people say to me.
This is probably obvious, but if this problem persisted, a Pascal-Mugging-vulnerable AI would immediately get mugged even without external offers or influence. The possibility alone, however remote, of a certain sequence of characters unlocking a hypothetical control console which could potentially access an above Turing computing model which could influence (insert sufficiently high number) amounts of matter/energy, would suffice. If an AI had to decide "until what length do I utter strange tentative passcodes in the hope of unlocking some higher level of physics", it would get mugged by the shadow of a matrix lord every time.
It sounds like what you're describing is something that Iain Banks calls an "Out of Context Problem" - it doesn't seem like a 'leverage penalty' is the proper way to conceptualize what you're applying, as much as a 'privilege penalty'.
In other words, when the sky suddenly opens up and blue fire pours out, the entire context for your previous set of priors needs to be re-evaluated - and the very question of "should I give this man $5" exists on a foundation of those now-devaluated priors.
Is there a formalized tree or mesh model for Bayesian probabilities? Because I think that might be fruitful.
There's something very counterintuitive about the notion that Pascal's Muggle is perfectly rational. But I think we need to do a lot more intuition-pump research before we'll have finished picking apart where that counterintuitiveness comes from. I take it your suggestion is that Pascal's Muggle seems unreasonable because he's overly confident in his own logical consistency and ability to construct priors that accurately reflect his credence levels. But he also seems unreasonable because he doesn't take into account that the likeliest explanations for the ...
One scheme with the properties you want is Wei Dai's UDASSA, e.g. see here. I think UDASSA is by far the best formal theory we have to date, although I'm under no delusions about how well it captures all of our intuitions (I'm also under no delusions about how consistent our intuitions are, so I'm resigned to accepting a scheme that doesn't capture them).
I think it would be more fair to call this allocation of measure part of my preferences, instead of "magical reality fluid." Thinking that your preferences are objective facts about the world see...
What if the mugger says he will give you a single moment of pleasure that is 3^^^3 times more intense than a standard good experience? Wouldn't the leverage penalty not apply and thus make the probability of the mugger telling the truth much higher?
I think the real reason the mugger shouldn't be given money is that people are more likely to be able to attain 3^^^3 utils by donating the five dollars to an existential risk-reducing charity. Even though the current universe presumably couldn't support 3^^^3 utils, there is a chance of being able to create or ...
Indeed, you can't ever present a mortal like me with evidence that has a likelihood ratio of a googolplex to one - evidence I'm a googolplex times more likely to encounter if the hypothesis is true, than if it's false - because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex. You know the old saying about how once you assign something probability one, or probability zero, you can never change your mind regardless of what evidence you see? Well, odds of a googolple...
Random thoughts here, not highly confident in their correctness.
Why is the leverage penalty seen as something that needs to be added, isn't it just the obviously correct way to do probability.
Suppose I want to calculate the probability that a race of aliens will descend from the skies and randomly declare me Overlord of Earth some time in the next year. To do this, I naturally go to Delphi to talk to the Oracle of Perfect Priors, and she tells me that the chance of aliens descending from the skies and declaring an Overlord of Earth in the next year is 0.00...
Okay, that makes sense. In that case, though, where's the problem? Claims in the form of "not only is X a true event, with details A, B, C, ..., but also it's the greatest event by metric M that has ever happened" should have low enough probability that a human writing it down specifically in advance as a hypothesis to consider, without being prompted by some specific evidence, is doing really badly epistemologically.
Also, I'm confused about the relationship to MWI.
Many of the conspiracy theories generated have some significant overlap (i.e. are not mutually exclusive), so one shouldn't expect the sum of their probabilities to be less than 1. It's permitted for P(Cube A is red) + P(Sphere X is blue) to be greater than 1.
Edit: formatting fixed. Thanks, wedrifid.
My response to the mugger:
My response to the scientist:
This system does seem to lead to the odd effect that you would probably be more willing to pay Pascal's Mugger to save 10^10^100 people than you would be willing to pay to save 10^10^101 people, since the leverage penalties make them about equal, but the latter has a higher complexity cost. In fact the leverage penalty effectively means that you cannot distinguish between events providing more utility than you can provide an appropriate amount of evidence to match.
Is there any particular reason an AI wouldn't be able to self-modify with regards to its prior/algorithm for deciding prior probabilities? A basic Solomonoff prior should include a non-negligible chance that it itself isn't perfect for finding priors, if I'm not mistaken. That doesn't answer the question as such, but it isn't obvious to me that it's necessary to answer this one to develop a Friendly AI.
As near as I can figure, the corresponding state of affairs to a complexity+leverage prior improbability would be a Tegmark Level IV multiverse in which each reality got an amount of magical-reality-fluid corresponding to the complexity of its program (1/2 to the power of its Kolmogorov complexity) and then this magical-reality-fluid had to be divided among all the causal elements within that universe - if you contain 3↑↑↑3 causal nodes, then each node can only get 1/3↑↑↑3 of the total realness of that universe.
The difference between this and average ut...
There is likely a broader-scoped discussion on this topic that I haven't read, so please point me to such a thread if my comment is addressed -- but it seems to me that there is a simpler resolution to this issue (as well as an obvious limitation to this way of thinking), namely that there's an almost immediate stage (in the context of highly-abstract hypotheticals) where probability assessment breaks down completely.
For example, there are an uncountably-infinite number of different parent universes we could have. There are even an uncountably-infinite ...
A few thoughts:
I haven't strongly considered my prior on being able to save 3^^^3 people (more on this to follow). But regardless of what that prior is, if approached by somebody claiming to be a Matrix Lord who claims he can save 3^^^3 people, I'm not only faced with the problem of whether I ought to pay him the $5 - I'm also faced with the question of whether I ought to walk over to the next beggar on the street, and pay him $0.01 to save 3^^^3 people. Is this person 500 times more likely to be able to save 3^^^3 people? From the outset, not really. And ...
Is it reasonable to take this as evidence that we shouldn't use expected utility computations, or not only expected utility computations, to guide our decisions?
If I understand the context, the reason we believed an entity, either a human or an AI, ought to use expected utility as a practical decision making strategy, is because it would yield good results (a simple, general architecture for decision making). If there are fully general attacks (muggings) on all entities that use expected utility as a practical decision making strategy, then perhaps we shou...
Followup to: Pascal's Mugging: Tiny Probabilities of Vast Utilities, The Pascal's Wager Fallacy Fallacy, Being Half-Rational About Pascal's Wager Is Even Worse
Short form: Pascal's Muggle
tl;dr: If you assign superexponentially infinitesimal probability to claims of large impacts, then apparently you should ignore the possibility of a large impact even after seeing huge amounts of evidence. If a poorly-dressed street person offers to save 10(10^100) lives (googolplex lives) for $5 using their Matrix Lord powers, and you claim to assign this scenario less than 10-(10^100) probability, then apparently you should continue to believe absolutely that their offer is bogus even after they snap their fingers and cause a giant silhouette of themselves to appear in the sky. For the same reason, any evidence you encounter showing that the human species could create a sufficiently large number of descendants - no matter how normal the corresponding laws of physics appear to be, or how well-designed the experiments which told you about them - must be rejected out of hand. There is a possible reply to this objection using Robin Hanson's anthropic adjustment against the probability of large impacts, and in this case you will treat a Pascal's Mugger as having decision-theoretic importance exactly proportional to the Bayesian strength of evidence they present you, without quantitative dependence on the number of lives they claim to save. This however corresponds to an odd mental state which some, such as myself, would find unsatisfactory. In the end, however, I cannot see any better candidate for a prior than having a leverage penalty plus a complexity penalty on the prior probability of scenarios.
In late 2007 I coined the term "Pascal's Mugging" to describe a problem which seemed to me to arise when combining conventional decision theory and conventional epistemology in the obvious way. On conventional epistemology, the prior probability of hypotheses diminishes exponentially with their complexity; if it would take 20 bits to specify a hypothesis, then its prior probability receives a 2-20 penalty factor and it will require evidence with a likelihood ratio of 1,048,576:1 - evidence which we are 1048576 times more likely to see if the theory is true, than if it is false - to make us assign it around 50-50 credibility. (This isn't as hard as it sounds. Flip a coin 20 times and note down the exact sequence of heads and tails. You now believe in a state of affairs you would have assigned a million-to-one probability beforehand - namely, that the coin would produce the exact sequence HTHHHHTHTTH... or whatever - after experiencing sensory data which are more than a million times more probable if that fact is true than if it is false.) The problem is that although this kind of prior probability penalty may seem very strict at first, it's easy to construct physical scenarios that grow in size vastly faster than they grow in complexity.
I originally illustrated this using Pascal's Mugger: A poorly dressed street person says "I'm actually a Matrix Lord running this world as a computer simulation, along with many others - the universe above this one has laws of physics which allow me easy access to vast amounts of computing power. Just for fun, I'll make you an offer - you give me five dollars, and I'll use my Matrix Lord powers to save 3↑↑↑↑3 people inside my simulations from dying and let them live long and happy lives" where ↑ is Knuth's up-arrow notation. This was originally posted in 2007, when I was a bit more naive about what kind of mathematical notation you can throw into a random blog post without creating a stumbling block. (E.g.: On several occasions now, I've seen someone on the Internet approximate the number of dust specks from this scenario as being a "billion", since any incomprehensibly large number equals a billion.) Let's try an easier (and way smaller) number instead, and suppose that Pascal's Mugger offers to save a googolplex lives, where a googol is 10100 (a 1 followed by a hundred zeroes) and a googolplex is 10 to the googol power, so 1010100 or 1010,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 lives saved if you pay Pascal's Mugger five dollars, if the offer is honest.
If Pascal's Mugger had only offered to save a mere googol lives (10100), we could perhaps reply that although the notion of a Matrix Lord may sound simple to say in English, if we actually try to imagine all the machinery involved, it works out to a substantial amount of computational complexity. (Similarly, Thor is a worse explanation for lightning bolts than the laws of physics because, among other points, an anthropomorphic deity is more complex than calculus in formal terms - it would take a larger computer program to simulate Thor as a complete mind, than to simulate Maxwell's Equations - even though in mere human words Thor sounds much easier to explain.) To imagine this scenario in formal detail, we might have to write out the laws of the higher universe the Mugger supposedly comes from, the Matrix Lord's state of mind leading them to make that offer, and so on. And so (we reply) when mere verbal English has been translated into a formal hypothesis, the Kolmogorov complexity of this hypothesis is more than 332 bits - it would take more than 332 ones and zeroes to specify - where 2-332 ~ 10-100. Therefore (we conclude) the net expected value of the Mugger's offer is still tiny, once its prior improbability is taken into account.
But once Pascal's Mugger offers to save a googolplex lives - offers us a scenario whose value is constructed by twice-repeated exponentiation - we seem to run into some difficulty using this answer. Can we really claim that the complexity of this scenario is on the order of a googol bits - that to formally write out the hypothesis would take one hundred billion billion times more bits than there are atoms in the observable universe?
And a tiny, paltry number like a googolplex is only the beginning of computationally simple numbers that are unimaginably huge. Exponentiation is defined as repeated multiplication: If you see a number like 35, it tells you to multiply five 3s together: 3×3×3×3×3 = 243. Suppose we write 35 as 3↑5, so that a single arrow ↑ stands for exponentiation, and let the double arrow ↑↑ stand for repeated exponentation, or tetration. Thus 3↑↑3 would stand for 3↑(3↑3) or 333 = 327 = 7,625,597,484,987. Tetration is also written as follows: 33 = 3↑↑3. Thus 42 = 2222 = 224 = 216 = 65,536. Then pentation, or repeated tetration, would be written with 3↑↑↑3 = 333 = 7,625,597,484,9873 = 33...3 where the ... summarizes an exponential tower of 3s seven trillion layers high.
But 3↑↑↑3 is still quite simple computationally - we could describe a small Turing machine which computes it - so a hypothesis involving 3↑↑↑3 should not therefore get a large complexity penalty, if we're penalizing hypotheses by algorithmic complexity.
I had originally intended the scenario of Pascal's Mugging to point up what seemed like a basic problem with combining conventional epistemology with conventional decision theory: Conventional epistemology says to penalize hypotheses by an exponential factor of computational complexity. This seems pretty strict in everyday life: "What? for a mere 20 bits I am to be called a million times less probable?" But for stranger hypotheses about things like Matrix Lords, the size of the hypothetical universe can blow up enormously faster than the exponential of its complexity. This would mean that all our decisions were dominated by tiny-seeming probabilities (on the order of 2-100 and less) of scenarios where our lightest action affected 3↑↑4 people... which would in turn be dominated by even more remote probabilities of affecting 3↑↑5 people...
This problem is worse than just giving five dollars to Pascal's Mugger - our expected utilities don't converge at all! Conventional epistemology tells us to sum over the predictions of all hypotheses weighted by their computational complexity and evidential fit. This works fine with epistemic probabilities and sensory predictions because no hypothesis can predict more than probability 1 or less than probability 0 for a sensory experience. As hypotheses get more and more complex, their contributed predictions have tinier and tinier weights, and the sum converges quickly. But decision theory tells us to calculate expected utility by summing the utility of each possible outcome, times the probability of that outcome conditional on our action. If hypothetical utilities can grow faster than hypothetical probability diminishes, the contribution of an average term in the series will keep increasing, and this sum will never converge - not if we try to do it the same way we got our epistemic predictions, by summing over complexity-weighted possibilities. (See also this similar-but-different paper by Peter de Blanc.)
Unfortunately I failed to make it clear in my original writeup that this was where the problem came from, and that it was general to situations beyond the Mugger. Nick Bostrom's writeup of Pascal's Mugging for a philosophy journal used a Mugger offering a quintillion days of happiness, where a quintillion is merely 1,000,000,000,000,000,000 = 1018. It takes at least two exponentiations to outrun a singly-exponential complexity penalty. I would be willing to assign a probability of less than 1 in 1018 to a random person being a Matrix Lord. You may not have to invoke 3↑↑↑3 to cause problems, but you've got to use something like 1010100 - double exponentiation or better. Manipulating ordinary hypotheses about the ordinary physical universe taken at face value, which just contains 1080 atoms within range of our telescopes, should not lead us into such difficulties.
(And then the phrase "Pascal's Mugging" got completely bastardized to refer to an emotional feeling of being mugged that some people apparently get when a high-stakes charitable proposition is presented to them, regardless of whether it's supposed to have a low probability. This is enough to make me regret having ever invented the term "Pascal's Mugging" in the first place; and for further thoughts on this see The Pascal's Wager Fallacy Fallacy (just because the stakes are high does not mean the probabilities are low, and Pascal's Wager is fallacious because of the low probability, not the high stakes!) and Being Half-Rational About Pascal's Wager Is Even Worse. Again, when dealing with issues the mere size of the apparent universe, on the order of 1080 - for small large numbers - we do not run into the sort of decision-theoretic problems I originally meant to single out by the concept of "Pascal's Mugging". My rough intuitive stance on x-risk charity is that if you are one of the tiny fraction of all sentient beings who happened to be born here on Earth before the intelligence explosion, when the existence of the whole vast intergalactic future depends on what we do now, you should expect to find yourself surrounded by a smorgasbord of opportunities to affect small large numbers of sentient beings. There is then no reason to worry about tiny probabilities of having a large impact when we can expect to find medium-sized opportunities of having a large impact, so long as we restrict ourselves to impacts no larger than the size of the known universe.)
One proposal which has been floated for dealing with Pascal's Mugger in the decision-theoretic sense is to penalize hypotheses that let you affect a large number of people, in proportion to the number of people affected - what we could call perhaps a "leverage penalty" instead of a "complexity penalty".
Unfortunately this potentially leads us into a different problem, that of Pascal's Muggle.
Suppose a poorly-dressed street person asks you for five dollars in exchange for doing a googolplex's worth of good using his Matrix Lord powers.
"Well," you reply, "I think it very improbable that I would be able to affect so many people through my own, personal actions - who am I to have such a great impact upon events? Indeed, I think the probability is somewhere around one over googolplex, maybe a bit less. So no, I won't pay five dollars - it is unthinkably improbable that I could do so much good!"
"I see," says the Mugger.
A wind begins to blow about the alley, whipping the Mugger's loose clothes about him as they shift from ill-fitting shirt and jeans into robes of infinite blackness, within whose depths tiny galaxies and stranger things seem to twinkle. In the sky above, a gap edged by blue fire opens with a horrendous tearing sound - you can hear people on the nearby street yelling in sudden shock and terror, implying that they can see it too - and displays the image of the Mugger himself, wearing the same robes that now adorn his body, seated before a keyboard and a monitor.
"That's not actually me," the Mugger says, "just a conceptual representation, but I don't want to drive you insane. Now give me those five dollars, and I'll save a googolplex lives, just as promised. It's easy enough for me, given the computing power my home universe offers. As for why I'm doing this, there's an ancient debate in philosophy among my people - something about how we ought to sum our expected utilities - and I mean to use the video of this event to make a point at the next decision theory conference I attend. Now will you give me the five dollars, or not?"
"Mm... no," you reply.
"No?" says the Mugger. "I understood earlier when you didn't want to give a random street person five dollars based on a wild story with no evidence behind it. But now I've offered you evidence."
"Unfortunately, you haven't offered me enough evidence," you explain.
"Really?" says the Mugger. "I've opened up a fiery portal in the sky, and that's not enough to persuade you? What do I have to do, then? Rearrange the planets in your solar system, and wait for the observatories to confirm the fact? I suppose I could also explain the true laws of physics in the higher universe in more detail, and let you play around a bit with the computer program that encodes all the universes containing the googolplex people I would save if you gave me the five dollars -"
"Sorry," you say, shaking your head firmly, "there's just no way you can convince me that I'm in a position to affect a googolplex people, because the prior probability of that is one over googolplex. If you wanted to convince me of some fact of merely 2-100 prior probability, a mere decillion to one - like that a coin would come up heads and tails in some particular pattern of a hundred coinflips - then you could just show me 100 bits of evidence, which is within easy reach of my brain's sensory bandwidth. I mean, you could just flip the coin a hundred times, and my eyes, which send my brain a hundred megabits a second or so - though that gets processed down to one megabit or so by the time it goes through the lateral geniculate nucleus - would easily give me enough data to conclude that this decillion-to-one possibility was true. But to conclude something whose prior probability is on the order of one over googolplex, I need on the order of a googol bits of evidence, and you can't present me with a sensory experience containing a googol bits. Indeed, you can't ever present a mortal like me with evidence that has a likelihood ratio of a googolplex to one - evidence I'm a googolplex times more likely to encounter if the hypothesis is true, than if it's false - because the chance of all my neurons spontaneously rearranging themselves to fake the same evidence would always be higher than one over googolplex. You know the old saying about how once you assign something probability one, or probability zero, you can never change your mind regardless of what evidence you see? Well, odds of a googolplex to one, or one to a googolplex, work pretty much the same way."
"So no matter what evidence I show you," the Mugger says - as the blue fire goes on crackling in the torn sky above, and screams and desperate prayers continue from the street beyond - "you can't ever notice that you're in a position to help a googolplex people."
"Right!" you say. "I can believe that you're a Matrix Lord. I mean, I'm not a total Muggle, I'm psychologically capable of responding in some fashion to that giant hole in the sky. But it's just completely forbidden for me to assign any significant probability whatsoever that you will actually save a googolplex people after I give you five dollars. You're lying, and I am absolutely, absolutely, absolutely confident of that."
"So you weren't just invoking the leverage penalty as a plausible-sounding way of getting out of paying me the five dollars earlier," the Mugger says thoughtfully. "I mean, I'd understand if that was just a rationalization of your discomfort at forking over five dollars for what seemed like a tiny probability, when I hadn't done my duty to present you with a corresponding amount of evidence before demanding payment. But you... you're acting like an AI would if it was actually programmed with a leverage penalty on hypotheses!"
"Exactly," you say. "I'm forbidden a priori to believe I can ever do that much good."
"Why?" the Mugger says curiously. "I mean, all I have to do is press this button here and a googolplex lives will be saved." The figure within the blazing portal above points to a green button on the console before it.
"Like I said," you explain again, "the prior probability is just too infinitesimal for the massive evidence you're showing me to overcome it -"
The Mugger shrugs, and vanishes in a puff of purple mist.
The portal in the sky above closes, taking with the console and the green button.
(The screams go on from the street outside.)
A few days later, you're sitting in your office at the physics institute where you work, when one of your colleagues bursts in through your door, seeming highly excited. "I've got it!" she cries. "I've figured out that whole dark energy thing! Look, these simple equations retrodict it exactly, there's no way that could be a coincidence!"
At first you're also excited, but as you pore over the equations, your face configures itself into a frown. "No..." you say slowly. "These equations may look extremely simple so far as computational complexity goes - and they do exactly fit the petabytes of evidence our telescopes have gathered so far - but I'm afraid they're far too improbable to ever believe."
"What?" she says. "Why?"
"Well," you say reasonably, "if these equations are actually true, then our descendants will be able to exploit dark energy to do computations, and according to my back-of-the-envelope calculations here, we'd be able to create around a googolplex people that way. But that would mean that we, here on Earth, are in a position to affect a googolplex people - since, if we blow ourselves up via a nanotechnological war or (cough) make certain other errors, those googolplex people will never come into existence. The prior probability of us being in a position to impact a googolplex people is on the order of one over googolplex, so your equations must be wrong."
"Hmm..." she says. "I hadn't thought of that. But what if these equations are right, and yet somehow, everything I do is exactly balanced, down to the googolth decimal point or so, with respect to how it impacts the chance of modern-day Earth participating in a chain of events that leads to creating an intergalactic civilization?"
"How would that work?" you say. "There's only seven billion people on today's Earth - there's probably been only a hundred billion people who ever existed total, or will exist before we go through the intelligence explosion or whatever - so even before analyzing your exact position, it seems like your leverage on future affairs couldn't reasonably be less than a one in ten trillion part of the future or so."
"But then given this physical theory which seems obviously true, my acts might imply expected utility differentials on the order of 1010100-13," she explains, "and I'm not allowed to believe that no matter how much evidence you show me."
This problem may not be as bad as it looks; with some further reasoning, the leverage penalty may lead to more sensible behavior than depicted above.
Robin Hanson has suggested that the logic of a leverage penalty should stem from the general improbability of individuals being in a unique position to affect many others (which is why I called it a leverage penalty). At most 10 out of 3↑↑↑3 people can ever be in a position to be "solely responsible" for the fate of 3↑↑↑3 people if "solely responsible" is taken to imply a causal chain that goes through no more than 10 people's decisions; i.e. at most 10 people can ever be solely10 responsible for any given event. Or if "fate" is taken to be a sufficiently ultimate fate that there's at most 10 other decisions of similar magnitude that could cumulate to determine someone's outcome utility to within ±50%, then any given person could have their fate10 determined on at most 10 occasions. We would surely agree, while assigning priors at the dawn of reasoning, that an agent randomly selected from the pool of all agents in Reality has at most a 100/X chance of being able to be solely10 responsible for the fate10 of X people. Any reasoning we do about universes, their complexity, sensory experiences, and so on, should maintain this net balance. You can even strip out the part about agents and carry out the reasoning on pure causal nodes; the chance of a randomly selected causal node being in a unique100 position on a causal graph with respect to 3↑↑↑3 other nodes ought to be at most 100/3↑↑↑3 for finite causal graphs. (As for infinite causal graphs, well, if problems arise only when introducing infinity, maybe it's infinity that has the problem.)
Suppose we apply the Hansonian leverage penalty to the face-value scenario of our own universe, in which there are apparently no aliens and the galaxies we can reach in the future contain on the order of 1080 atoms; which, if the intelligence explosion goes well, might be transformed into on the very loose order of... let's ignore a lot of intermediate calculations and just call it the equivalent of 1080 centuries of life. (The neurons in your brain perform lots of operations; you don't get only one computing operation per element, because you're powered by the Sun over time. The universe contains a lot more negentropy than just 1080 bits due to things like the gravitational potential energy that can be extracted from mass. Plus we should take into account reversible computing. But of course it also takes more than one computing operation to implement a century of life. So I'm just going to xerox the number 1080 for use in these calculations, since it's not supposed to be the main focus.)
Wouldn't it be terribly odd to find ourselves - where by 'ourselves' I mean the hundred billion humans who have ever lived on Earth, for no more than a century or so apiece - solely100,000,000,000 responsible for the fate10 of around 1080 units of life? Isn't the prior probability of this somewhere around 10-68?
Yes, according to the leverage penalty. But a prior probability of 10-68 is not an insurmountable epistemological barrier. If you're taking things at face value, 10-68 is just 226 bits of evidence or thereabouts, and your eyes are sending you a megabit per second. Becoming convinced that you, yes you are an Earthling is epistemically doable; you just need to see a stream of sensory experiences which is 1068 times more probable if you are an Earthling than if you are someone else. If we take everything at face value, then there could be around 1080 centuries of life over the history of the universe, and only 1011 of those centuries will be lived by creatures who discover themselves occupying organic bodies. Taking everything at face value, the sensory experiences of your life are unique to Earthlings and should immediately convince you that you're an Earthling - just looking around the room you occupy will provide you with sensory experiences that plausibly belong to only 1011 out of 1080 life-centuries.
If we don't take everything at face value, then there might be such things as ancestor simulations, and it might be that your experience of looking around the room is something that happens in 1020 ancestor simulations for every time that it happens in 'base level' reality. In this case your probable leverage on the future is diluted (though it may be large even post-dilution). But this is not something that the Hansonian leverage penalty forces you to believe - not when the putative stakes are still as small as 1080. Conceptually, the Hansonian leverage penalty doesn't interact much with the Simulation Hypothesis (SH) at all. If you don't believe SH, then you think that the experiences of creatures like yours are rare in the universe and hence present strong, convincing evidence for you occupying the leverage-privileged position of an Earthling - much stronger evidence than its prior improbability. (There's some separate anthropic issues here about whether or not this is itself evidence for SH, but I don't think that question is intrinsic to leverage penalties per se.)
A key point here is that even if you accept a Hanson-style leverage penalty, it doesn't have to manifest as an inescapable commandment of modesty. You need not refuse to believe (in your deep and irrevocable humility) that you could be someone as special as an Ancient Earthling. Even if Earthlings matter in the universe - even if we occupy a unique position to affect the future of galaxies - it is still possible to encounter pretty convincing evidence that you're an Earthling. Universes the size of 1080 do not pose problems to conventional decision-theoretic reasoning, or to conventional epistemology.
Things play out similarly if - still taking everything at face value - you're wondering about the chance that you could be special even for an Earthling, because you might be one of say 104 people in the history of the universe who contribute a major amount to an x-risk reduction project which ends up actually saving the galaxies. The vast majority of the improbability here is just in being an Earthling in the first place! Thus most of the clever arguments for not taking this high-impact possibility at face value would also tell you not to take being an Earthling at face value, since Earthlings as a whole are much more unique within the total temporal history of the universe than you are supposing yourself to be unique among Earthlings. But given ¬SH, the prior improbability of being an Earthling can be overcome by a few megabits of sensory experience from looking around the room and querying your memories - it's not like 1080 is enough future beings that the number of agents randomly hallucinating similar experiences outweighs the number of real Earthlings. Similarly, if you don't think lots of Earthlings are hallucinating the experience of going to a donation page and clicking on the Paypal button for an x-risk charity, that sensory experience can easily serve to distinguish you as one of 104 people donating to an x-risk philanthropy.
Yes, there are various clever-sounding lines of argument which involve not taking things at face value - "Ah, but maybe you should consider yourself as an indistinguishable part of this here large reference class of deluded people who think they're important." Which I consider to be a bad idea because it renders you a permanent Muggle by putting you into an inescapable reference class of self-deluded people and then dismissing all your further thoughts as insufficient evidence because you could just be deluding yourself further about whether these are good arguments. Nor do I believe the world can only be saved by good people who are incapable of distinguishing themselves from a large class of crackpots, all of whom have no choice but to continue based on the tiny probability that they are not crackpots. (For more on this see Being Half-Rational About Pascal's Wager Is Even Worse.) In this case you are a Pascal's Muggle not because you've explicitly assigned a probability like one over googolplex, but because you took an improbability like 10-6 at unquestioning face value and then cleverly questioned all the evidence which could've overcome that prior improbability, and so, in practice, you can never climb out of the epistemological sinkhole. By the same token, you should conclude that you are just self-deluded about being an Earthling since real Earthlings are so rare and privileged in their leverage.
In general, leverage penalties don't translate into advice about modesty or that you're just deluding yourself - they just say that to be rationally coherent, your picture of the universe has to imply that your sensory experiences are at least as rare as the corresponding magnitude of your leverage.
Which brings us back to Pascal's Mugger, in the original alleyway version. The Hansonian leverage penalty seems to imply that to be coherent, either you believe that your sensory experiences are really actually 1 in a googolplex - that only 1 in a googolplex beings experiences what you're experiencing - or else you really can't take the situation at face value.
Suppose the Mugger is telling the truth, and a googolplex other people are being simulated. Then there are at least a googolplex people in the universe. Perhaps some of them are hallucinating a situation similar to this one by sheer chance? Rather than telling you flatly that you can't have a large impact, the Hansonian leverage penalty implies a coherence requirement on how uniquely you think your sensory experiences identify the position you believe yourself to occupy. When it comes to believing you're one of 1011 Earthlings who can impact 1080 other life-centuries, you need to think your sensory experiences are unique to Earthlings - identify Earthlings with a likelihood ratio on the order of 1069. This is quite achievable, if we take the evidence at face value. But when it comes to improbability on the order of 1/3↑↑↑3, the prior improbability is inescapable - your sensory experiences can't possibly be that unique - which is assumed to be appropriate because almost-everyone who ever believes they'll be in a position to help 3↑↑↑3 people will in fact be hallucinating. Boltzmann brains should be much more common than people in a unique position to affect 3↑↑↑3 others, at least if the causal graphs are finite.
Furthermore - although I didn't realize this part until recently - applying Bayesian updates from that starting point may partially avert the Pascal's Muggle effect:
Mugger: "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."
You: "Nope."
Mugger: "Why not? It's a really large impact."
You: "Yes, and I assign a probability on the order of 1 in 3↑↑↑3 that I would be in a unique position to affect 3↑↑↑3 people."
Mugger: "Oh, is that really the probability that you assign? Behold!"
(A gap opens in the sky, edged with blue fire.)
Mugger: "Now what do you think, eh?"
You: "Well... I can't actually say this observation has a likelihood ratio of 3↑↑↑3 to 1. No stream of evidence that can enter a human brain over the course of a century is ever going to have a likelihood ratio larger than, say, 101026 to 1 at the absurdly most, assuming one megabit per second of sensory data, for a century, each bit of which has at least a 1-in-a-trillion error probability. I'd probably start to be dominated by Boltzmann brains or other exotic minds well before then."
Mugger: "So you're not convinced."
You: "Indeed not. The probability that you're telling the truth is so tiny that God couldn't find it with an electron microscope. Here's the five dollars."
Mugger: "Done! You've saved 3↑↑↑3 lives! Congratulations, you're never going to top that, your peak life accomplishment will now always lie in your past. But why'd you give me the five dollars if you think I'm lying?"
You: "Well, because the evidence you did present me with had a likelihood ratio of at least a billion to one - I would've assigned less than 10-9 prior probability of seeing this when I woke up this morning - so in accordance with Bayes's Theorem I promoted the probability from 1/3↑↑↑3 to at least 109/3↑↑↑3, which when multiplied by an impact of 3↑↑↑3, yields an expected value of at least a billion lives saved for giving you five dollars."
I confess that I find this line of reasoning a bit suspicious - it seems overly clever. But on the level of intuitive virtues of rationality, it does seem less stupid than the original Pascal's Muggle; this muggee is at least behaviorally reacting to the evidence. In fact, they're reacting in a way exactly proportional to the evidence - they would've assigned the same net importance to handing over the five dollars if the Mugger had offered 3↑↑↑4 lives, so long as the strength of the evidence seemed the same.
(Anyone who tries to apply the lessons here to actual x-risk reduction charities (which I think is probably a bad idea), keep in mind that the vast majority of the improbable-position-of-leverage in any x-risk reduction effort comes from being an Earthling in a position to affect the future of a hundred billion galaxies, and that sensory evidence for being an Earthling is what gives you most of your belief that your actions can have an outsized impact.)
So why not just run with this - why not just declare the decision-theoretic problem resolved, if we have a rule that seems to give reasonable behavioral answers in practice? Why not just go ahead and program that rule into an AI?
Well... I still feel a bit nervous about the idea that Pascal's Muggee, after the sky splits open, is handing over five dollars while claiming to assign probability on the order of 109/3↑↑↑3 that it's doing any good.
I think that my own reaction in a similar situation would be along these lines instead:
Mugger: "Give me five dollars, and I'll save 3↑↑↑3 lives using my Matrix Powers."
Me: "Nope."
Mugger: "So then, you think the probability I'm telling the truth is on the order of 1/3↑↑↑3?"
Me: "Yeah... that probably has to follow. I don't see any way around that revealed belief, given that I'm not actually giving you the five dollars. I've heard some people try to claim silly things like, the probability that you're telling the truth is counterbalanced by the probability that you'll kill 3↑↑↑3 people instead, or something else with a conveniently equal and opposite utility. But there's no way that things would balance out exactly in practice, if there was no a priori mathematical requirement that they balance. Even if the prior probability of your saving 3↑↑↑3 people and killing 3↑↑↑3 people, conditional on my giving you five dollars, exactly balanced down to the log(3↑↑↑3) decimal place, the likelihood ratio for your telling me that you would "save" 3↑↑↑3 people would not be exactly 1:1 for the two hypotheses down to the log(3↑↑↑3) decimal place. So if I assigned probabilities much greater than 1/3↑↑↑3 to your doing something that affected 3↑↑↑3 people, my actions would be overwhelmingly dominated by even a tiny difference in likelihood ratio elevating the probability that you saved 3↑↑↑3 people over the probability that you did something bad to them. The only way this hypothesis can't dominate my actions - really, the only way my expected utility sums can converge at all - is if I assign probability on the order of 1/3↑↑↑3 or less. I don't see any way of escaping that part."
Mugger: "But can you, in your mortal uncertainty, truly assign a probability as low as 1 in 3↑↑↑3 to any proposition whatever? Can you truly believe, with your error-prone neural brain, that you could make 3↑↑↑3 statements of any kind one after another, and be wrong, on average, about once?"
Me: "Nope."
Mugger: "So give me five dollars!"
Me: "Nope."
Mugger: "Why not?"
Me: "Because even though I, in my mortal uncertainty, will eventually be wrong about all sorts of things if I make enough statements one after another, this fact can't be used to increase the probability of arbitrary statements beyond what my prior says they should be, because then my prior would sum to more than 1. There must be some kind of required condition for taking a hypothesis seriously enough to worry that I might be overconfident about it -"
Mugger: "Then behold!"
(A gap opens in the sky, edged with blue fire.)
Mugger: "Now what do you think, eh?"
Me (staring up at the sky): "...whoa." (Pause.) "You turned into a cat."
Mugger: "What?"
Me: "Private joke. Okay, I think I'm going to have to rethink a lot of things. But if you want to tell me about how I was wrong to assign a prior probability on the order of 1/3↑↑↑3 to your scenario, I will shut up and listen very carefully to what you have to say about it. Oh, and here's the five dollars, can I pay an extra twenty and make some other requests?"
(The thought bubble pops, and we return to two people standing in an alley, the sky above perfectly normal.)
Mugger: "Now, in this scenario we've just imagined, you were taking my case seriously, right? But the evidence there couldn't have had a likelihood ratio of more than 101026 to 1, and probably much less. So by the method of imaginary updates, you must assign probability at least 10-1026 to my scenario, which when multiplied by a benefit on the order of 3↑↑↑3, yields an unimaginable bonanza in exchange for just five dollars -"
Me: "Nope."
Mugger: "How can you possibly say that? You're not being logically coherent!"
Me: "I agree that I'm not being logically coherent, but I think that's acceptable in this case."
Mugger: "This ought to be good. Since when are rationalists allowed to deliberately be logically incoherent?"
Me: "Since we don't have infinite computing power -"
Mugger: "That sounds like a fully general excuse if I ever heard one."
Me: "No, this is a specific consequence of bounded computing power. Let me start with a simpler example. Suppose I believe in a set of mathematical axioms. Since I don't have infinite computing power, I won't be able to know all the deductive consequences of those axioms. And that means I will necessarily fall prey to the conjunction fallacy, in the sense that you'll present me with a theorem X that is a deductive consequence of my axioms, but which I don't know to be a deductive consequence of my axioms, and you'll ask me to assign a probability to X, and I'll assign it 50% probability or something. Then you present me with a brilliant lemma Y, which clearly seems like a likely consequence of my mathematical axioms, and which also seems to imply X - once I see Y, the connection from my axioms to X, via Y, becomes obvious. So I assign P(X&Y) = 90%, or something like that. Well, that's the conjunction fallacy - I assigned P(X&Y) > P(X). The thing is, if you then ask me P(X), after I've seen Y, I'll reply that P(X) is 91% or at any rate something higher than P(X&Y). I'll have changed my mind about what my prior beliefs logically imply, because I'm not logically omniscient, even if that looks like assigning probabilities over time which are incoherent in the Bayesian sense."
Mugger: "And how does this work out to my not getting five dollars?"
Me: "In the scenario you're asking me to imagine, you present me with evidence which I currently think Just Plain Shouldn't Happen. And if that actually does happen, the sensible way for me to react is by questioning my prior assumptions and the reasoning which led me assign such low probability. One way that I handle my lack of logical omniscience - my finite, error-prone reasoning capabilities - is by being willing to assign infinitesimal probabilities to non-privileged hypotheses so that my prior over all possibilities can sum to 1. But if I actually see strong evidence for something I previously thought was super-improbable, I don't just do a Bayesian update, I should also question whether I was right to assign such a tiny probability in the first place - whether it was really as complex, or unnatural, as I thought. In real life, you are not ever supposed to have a prior improbability of 10-100 for some fact distinguished enough to be written down in advance, and yet encounter strong evidence, say 1010 to 1, that the thing has actually happened. If something like that happens, you don't do a Bayesian update to a posterior of 10-90. Instead you question both whether the evidence might be weaker than it seems, and whether your estimate of prior improbability might have been poorly calibrated, because rational agents who actually have well-calibrated priors should not encounter situations like that until they are ten billion days old. Now, this may mean that I end up doing some non-Bayesian updates: I say some hypothesis has a prior probability of a quadrillion to one, you show me evidence with a likelihood ratio of a billion to one, and I say 'Guess I was wrong about that quadrillion to one thing' rather than being a Muggle about it. And then I shut up and listen to what you have to say about how to estimate probabilities, because on my worldview, I wasn't expecting to see you turn into a cat. But for me to make a super-update like that - reflecting a posterior belief that I was logically incorrect about the prior probability - you have to really actually show me the evidence, you can't just ask me to imagine it. This is something that only logically incoherent agents ever say, but that's all right because I'm not logically omniscient."
At some point, we're going to have to build some sort of actual prior into, you know, some sort of actual self-improving AI.
(Scary thought, right?)
So far as I can presently see, the logic requiring some sort of leverage penalty - not just so that we don't pay $5 to Pascal's Mugger, but also so that our expected utility sums converge at all - seems clear enough that I can't yet see a good alternative to it (feel welcome to suggest one), and Robin Hanson's rationale is by far the best I've heard.
In fact, what we actually need is more like a combined leverage-and-complexity penalty, to avoid scenarios like this:
Mugger: "Give me $5 and I'll save 3↑↑↑3 people."
You: "I assign probability exactly 1/3↑↑↑3 to that."
Mugger: "So that's one life saved for $5, on average. That's a pretty good bargain, right?"
You: "Not by comparison with x-risk reduction charities. But I also like to do good on a smaller scale now and then. How about a penny? Would you be willing to save 3↑↑↑3/500 lives for a penny?"
Mugger: "Eh, fine."
You: "Well, the probability of that is 500/3↑↑↑3, so here's a penny!" (Goes on way, whistling cheerfully.)
Adding a complexity penalty and a leverage penalty is necessary, not just to avert this exact scenario, but so that we don't get an infinite expected utility sum over a 1/3↑↑↑3 probability of saving 3↑↑↑3 lives, 1/(3↑↑↑3 + 1) probability of saving 3↑↑↑3 + 1 lives, and so on. If we combine the standard complexity penalty with a leverage penalty, the whole thing should converge.
Probability penalties are epistemic features - they affect what we believe, not just what we do. Maps, ideally, correspond to territories. Is there any territory that this complexity+leverage penalty can correspond to - any state of a single reality which would make these the true frequencies? Or is it only interpretable as pure uncertainty over realities, with there being no single reality that could correspond to it? To put it another way, the complexity penalty and the leverage penalty seem unrelated, so perhaps they're mutually inconsistent; can we show that the union of these two theories has a model?
As near as I can figure, the corresponding state of affairs to a complexity+leverage prior improbability would be a Tegmark Level IV multiverse in which each reality got an amount of magical-reality-fluid corresponding to the complexity of its program (1/2 to the power of its Kolmogorov complexity) and then this magical-reality-fluid had to be divided among all the causal elements within that universe - if you contain 3↑↑↑3 causal nodes, then each node can only get 1/3↑↑↑3 of the total realness of that universe. (As always, the term "magical reality fluid" reflects an attempt to demarcate a philosophical area where I feel quite confused, and try to use correspondingly blatantly wrong terminology so that I do not mistake my reasoning about my confusion for a solution.) This setup is not entirely implausible because the Born probabilities in our own universe look like they might behave like this sort of magical-reality-fluid - quantum amplitude flowing between configurations in a way that preserves the total amount of realness while dividing it between worlds - and perhaps every other part of the multiverse must necessarily work the same way for some reason. It seems worth noting that part of what's motivating this version of the 'territory' is that our sum over all real things, weighted by reality-fluid, can then converge. In other words, the reason why complexity+leverage works in decision theory is that the union of the two theories has a model in which the total multiverse contains an amount of reality-fluid that can sum to 1 rather than being infinite. (Though we need to suppose that either (a) only programs with a finite number of causal nodes exist, or (2) programs can divide finite reality-fluid among an infinite number of nodes via some measure that gives every experience-moment a well-defined relative amount of reality-fluid. Again see caveats about basic philosophical confusion - perhaps our map needs this property over its uncertainty but the territory doesn't have to work the same way, etcetera.)
If an AI's overall architecture is also such as to enable it to carry out the "You turned into a cat" effect - where if the AI actually ends up with strong evidence for a scenario it assigned super-exponential improbability, the AI reconsiders its priors and the apparent strength of evidence rather than executing a blind Bayesian update, though this part is formally a tad underspecified - then at the moment I can't think of anything else to add in.
In other words: This is my best current idea for how a prior, e.g. as used in an AI, could yield decision-theoretic convergence over explosively large possible worlds.
However, I would still call this a semi-open FAI problem (edit: wide-open) because it seems quite plausible that somebody is going to kick holes in the overall view I've just presented, or come up with a better solution, possibly within an hour of my posting this - the proposal is both recent and weak even by my standards. I'm also worried about whether it turns out to imply anything crazy on anthropic problems. Over to you, readers.