In classical game theory, rational agents always choose options which are optimal according to their preferences. Even when such a choice implies the ability to evaluate functions that are provably uncomputable. In other words, rationality implies hypercomputation.

We can reduce any rational agent into an oracle for any decision problem by asking it to choose between YES and NO, and giving the agent a higher payoff for choosing the correct answer. [1] If we choose the halting problem, a properly motivated rational agent will act as a halting oracle[2] If we could look at the gears of a rational agent, we'd be able to find some subsystem that was performing hypercomputation.

Any computable implementation of any decision theory will necessarily fail to choose rationally in some contexts. For any undecidable problem, it is provably impossible for any algorithm to choose the correct answer in all cases.

This begs the question: if a rational agent had to delegate their decision to a computer program, what sort of program would they choose?

  1. ^

    Or at least we could if rational agents weren't hypercomputational horrors from beyond spacetime. 

  2. ^

    Aligning the behavior of an infinitely intelligent alien creature, using externally supplied payoffs, is left as an exercise to the reader.

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 9:01 AM

The levels of computability of various notions of optimal decision making are discussed in Jan Leike's PhD thesis: https://jan.leike.name/publications/Nonparametric%20General%20Reinforcement%20Learning%20-%20Leike%202016.pdf

This is a much more nuanced take! At the beginning of Chapter 6, Jan proposes restricting our attention to agents which are limit computable

Our agents are useless if they cannot be approximated in practice, i.e., by a regular Turing machine. Therefore we posit that any ideal for a ‘perfect agent’ needs to be limit computable ().

This seems like a very reasonable restriction! Any implementation needs to be computable, but it makes sense to look for theoretic ideals which can be approximated.

This rather reminds be of a discussion (which alas I cannot recall a reference for) of the idea that a system having "free will" was an illusion (or at least a heuristic viewpoint) induced by computational resource limitations of not having sufficient computational resources and data to fully model and predict the system's behavior, and that thus something that to us looks like it has free will (including ourselves) might be seen as entirely deterministic or probabilistic to an agent with much higher computational and data resources. If you can predict the behavior of a computationally-bounded (and thus not fully rational) agent sufficiently well to sucessfully Dutch-book them, because you have significantly more computational resources than them, then they probably don't look to you as if they entirely have "free will".

Yes! I'm a fan of Yudkowsky's view that the sensation of free will is the sensation of "couldness" among multiple actions. When it feels like I could do one thing or another, it feels like I have free will. When it feels like I could have chosen differently, it feels like I chose freely.

I suspect that an important ingredient of the One True Decision Theory is being shaped in such a way that other agents, modelling how you'll respond to different policies they might implement, find it in their interest to implement policies which treat you fairly.