The Generalized Anti-Pascal Principle: Utility Convergence of Infinitesimal Probabilities
Edit: Added clarification of the limit in response to gwern's comment.
I encountered this issue again while reading through a fascinating discussion thread on John Baez's blog from earlier this year where Greg Egan jumped in with a "Yudkowsky/Bostrom" criticism:
The Yudkowsky/Bostrom strategy is to contrive probabilities for immensely unlikely scenarios, and adjust the figures until the expectation value for the benefits of working on — or donating to — their particular pet projects exceed the benefits of doing anything else. Combined with the appeal to vanity of “saving the universe”, some people apparently find this irresistible, but frankly, their attempt to prescribe what rational altruists should be doing with their time and money is just laughable, and it’s a shame you’ve given it so much air time.
In short, Egan is indirectly accusing SIAI and FHI of Pascal Mugging(among else): something serious indeed. Egan in particular presents the following (presumably Yudkowsky) quote as evidence:
Anyway: In terms of expected utility maximization, even large probabilities of jumping the interval between a universe-history in which 95% of existing biological species survive Earth’s 21st century, versus a universe-history where 80% of species survive, are just about impossible to trade off against tiny probabilities of jumping the interval between interesting universe-histories, versus boring ones where intelligent life goes extinct, or the wrong sort of AI self-improves.
Yudkowsky responds with his Pascal's Wager Fallacy Fallacy, and points out that in fact he agrees there is no case for investing in defense against highly improbable existential risks:
And I don’t think the odds of us being wiped out by badly done AI are small. I think they’re easily larger than 10%. And if you can carry a qualitative argument that the probability is under, say, 1%, then that means AI is probably the wrong use of marginal resources – not because global warming is more important, of course, but becauseother ignored existential risks like nanotech would be more important. I am not trying to play burden-of-proof tennis. If the chances are under 1%, that’s low enough, we’ll drop the AI business from consideration until everything more realistic has been handled.
The rest of the thread makes for an entertaining read, but the takeaway I'd like to focus on is the original source of Egan's criticism: the apparent domination of immensely unlikely scenarios of immensely high utility.
It occurred to me that the expected value of any action - properly summed over subsets of integrated futures - necessarily converges to zero as the probability of those considered subsets goes to zero. Critically this convergence occurs for *all* utility functions, as it is not dependent on any particular utility assignments. Alas LW is vast enough that there may be little new left under the sun: In researching this idea, I encountered an earlier form of it in a post by SilasBart here, as well as some earlier attempts by RichardKennaway, Komponisto, and jimrandomh.
Now that we've covered the background, I'll jump to the principle:
The Infinitesimal Probability Utility Convergence Principle (IPUP): For any action A, utility function U, and a subset of possible post-action futures F, EU(F) -> 0 as p(F) -> 0.
In Pascal's Mugging scenarios we are considering possible scenarios (futures) that have some low probability. It is important to remember that rational agents compute expected reward over all possible futures, not just the one scenario we may be focusing on.
The principle can be formalized in the theoretical context of perfect omniscience-approaching agents running on computers approaching infinite power.
The AIXI formalization provides a simple mathematical model of such agents. It's single line equation has a concise English summary:
If the environment is modeled by a deterministic program q, then the future perceptions ...okrk...omrm = U(q,a1..am) can be computed, where U is a universal (monotone Turing) machine executing q given a1..am. Since q is unknown, AIXI has to maximize its expected reward, i.e. average rk+...+rm over all possible future perceptions created by all possible environments q that are consistent with past perceptions. The simpler an environment, the higher is its a-priori contribution 2-l(q), where simplicity is measured by the length l of program q. AIXI effectively learns by eliminating Turing machines q once they become inconsistent with the progressing history. Since noisy environments are just mixtures of deterministic environments, they are automatically included.
AIXI is just a mathematical equation. We must be very careful in mapping it to abstract scenarios lest we lose much in translation. It is best viewed as a family of agent-models, the reward observations it seeks to maximize could be anything.
When one ponders: "What would AIXI/Omega do?" There are a couple of key points to keep in mind:
- AIXI like models (probably) simulate the entire complete infinitely branching multiverse from the beginning of time to infinity (as particular simulation programs). This is often lost in translation.
- AIXI like models compute 1 (the infinite totality of existence), not once, but for each of an infinite number of programs (corresponding to what we would call universal physics: theories of everything) in parallel. Thus AIXI computes (in parallel) the entire Tegmark multiverse: every possible universe that could exist in principle.
- AIXI 'learns' by eliminating sub-universes (and theories) that do not perfectly agree with it's observation history to date. Of course this is only ever a finite reduction, it never collapses the multiverse from an infinite set into a finite set.
- AIXI finally picks an action A that maximizes expected reward. It computes this measure by summing over, for each observation-valid universe (computed by a particular theory-program 1) in the multiverse ensemble (2), the total accumulated reward in the sub-universes branching off from that action, weighted by a scoring term for each valid universe that decreases with the negative exponent of the theory's program length.
In other words the perfectly rational agent considers everything that could possibly happen as a consequence of it's action in every possible universe it could be in, weighted by an exponential penalty against high-complexity universes.
Here is a sketch of how the limit convergence (IPUP above) can be derived: When considering a possible action A, such as giving $5 to a Pascal Mugger, an optimal agent considers all possible dependent futures for all possible physics-universes. As we advance into scenarios of infinitesimal probability, we are advancing up the complexity ladder into increasingly chaotic universes which feature completely random rewards which approach positive/negative infinity. As we advance into this regime of infinitesimal probability, causality itself breaks down completely and expected reward of any action goes to zero.
The convergence principle can be derived from the program length prior 2^-l(q). An agent which has accumulated P perception bits so far can fully explain those perceptions by completely random programs of length P, thus 2^-l(P) forms a probability limit at which the agent's perceptions start becoming irrelevant, and chaotic non-causal physics dominate. Chaos should dominate expected reward for actions where p(A) << 2^-l(P).
Thinking as a limited human, we impose abstractions and collapse all extremely similar (to us) futures. All the tiny random quantum-dependent variations of a particular future correspond to "giving the Mugger $5" we collapse into a single set of futures which we assign a probability to based on counting the subinstances in that set as a fraction of the whole.
AIXI does not do this: it actually computes each individual future path.
But as we can't hope to think that way, we have to think in terms of probability categorizations. Fine. Imagine collapsing any futures that are sufficiently indistinguishable such that humans would consider them identical: described by the same natural language. We then get subsets of futures which we assign probabilities as relative size measures.
Now consider ranking all of those future-sets in decreasing probability order. Most of the early list is dominated by Mugger is (joking/lying/crazy/etc). Farther down the list you get into scenarios where we do live in a multi-level Simulation (AIXI only ever considers itself in some simulation), but the Mugger is still (joking/lying/crazy/etc).
By the time you get down the list to scenarios described where the Mugger says "Or else I will use my magic powers from outside the Matrix to run a Turing machine that simulates and kills 3^^^^3 people" and what the Mugger says actually happens, we are almost certainly down in infinitesimal probability land.
Infinitesimal probability land is a wierd place. It is a regime where the physics that we commonly accept is wrong - which is to say simply that the exponential complexity penalty no longer rules out ultra-complex universes. It is dominated by chaos: universes of every possible fancy, where nothing is as what it seems, where everything you possibly thought is completely wrong, where there is no causality, etc. etc.
At the complete limit of improbability, we just get universes where our entire observation history is completely random - generated by programs more complex than our observations. You give the mugger $5 and the universe simply dissolves in white noise and nothing happens (or god appears and gives you infinite heaven, or infinite hell, or the speed of light goes to zero, or a black hole forms near your nose, or the Mugger turns into jellybeans, etc. etc., an infinite number of stories, over which the net reward summation necessarily collapses to zero.)
Remember AIXI doesn't consider the mugger's words as 'evidence', they are simply observations. In the more complex universes they are completely devoid of meaning, as causality itself collapses.