# 15

Personal Blog

Outline: Constructing utility functions that can be evaluated on any possible universe is known to be a confusing problem, since it is not obvious what sort of mathematical object should be the domain and what properties should the function obey. In a sequence of posts, I intend break down the question with respect to Tegmark's multiverse levels and explain the answer on each level, starting with level IV in the current post.

# Background

An intelligent agent is often described as an entity whose actions drive the universe towards higher expectation values of a certain function, known as the agent's utility function. Such a description is very useful in contexts such as AGI, FAI, decision theory and more generally any abstract study of intelligence.

Applying the concept of a utility function to agents in the real worlds requires utility functions with a very broad domain. Indeed, since the agent is normally assumed to have only finite information about the universe in which it exists, it should allow for a very large variety of possible realities. If the agent is to make decisions using some sort of utility calculus, it has to be able to evaluate its utility function on each of the realities it can conceive.

Tegmark has conveniently arranged the space of possible realities ("universes") into 4 levels, 3 of which are based on our current understanding of physics. Tegmark's universes are usually presented as co-existing but it is also possible to think of them as the "potential" universes in which our agent can find itself. I am going to traverse Tegmark's multiverse from top to bottom, studying the space of utility functions on each level (which, except for level IV, is always derived from the higher level). The current post addresses Tegmark level IV, leaving the lower levels for follow-ups.

Some of the ideas in this post previously appeared in a post about intelligence metrics, where I explained them much more tersely.

# Tegmark Level IV

Tegmark defined this level as the collection of all mathematical models. Since it is not even remotely clear how to define such a beast, I am going to use a different space which (I claim) is conceptually very close. Namely, I am going to consider universes to be infinite binary sequences $\lbrace x_i \rbrace_{i \in \mathbb{N}}$. I denote the by $X$ the space of all such sequences equipped with the product topology. As will become clearer in the following, this space embodies "all possible realities" since any imaginable reality can be encoded in such a sequence1.

The natural a priori probability measure on this space is the Solomonoff measure $\mu$. Thus, a priori utility expectation values take the form

[1] $E[U]=\int_X U(x) d\mu(x)$

From the point of view of Updateless Decision Theory, a priori expectation values are the only sort that matters: conditional expectation values wrt logical uncertainty replace the need to update the measure.

In order to guarantee the convergence of expectation values, we are going to assume $U$ is a bounded function

## A Simple Example

So far, we know little about the form of the function $U$. To illustrate the sort of constructions that are relevant for realistic or semi-realistic agents, I am going to consider a simple example: the glider maximizer.

The glider maximizer $G$ is an agent living inside the Game of Life. Fix $V$ a forward light cone within the Game of Life spacetime, representing the volume $G$ is able to influence. $G$ maximizes the following utility function:

$U_G(h)=\sum_{t=0}^\infty \beta^t N_t(h;V)$

Here, $h$ is a history of the Game of Life, $\beta$ is a constant in $(0,1)$ and $N_t(h;V)$ is the number of gliders at time $t$ inside $V$.

We wish to "release" $G$ from the Game of Life universe into the broader multiverse. In order words, we want an agent that doesn't dogmatically assume itself to exist with the Game of Life, instead searching for appearances of the Game of Life in the physical universe and maximizing gliders there.

To accomplish this, fix a way $f$ to bijectively encode histories of $V$ as binary sequences. Allow arbitrary histories: don't impose Game of Life rules. We can then define the "multiversal" utility function

$U^M_G(x)=\sum_{t=0}^\infty \beta^t [N_t(f^{-1}(x);W(f^{-1}(x)))-\gamma n_t(f^{-1}(x))]$

Here $W(h)$ is the set of cells in which $h$ satisfies Game of Life rules, $\gamma$ is a positive constant and $n_t(h)$ is the number of cells in $V \setminus W(h)$ at time $t$.

In other words, the "liberated" $G$ prefers for many cells to satisfy Game of Life rules and for many cells out of these to contain gliders.

Superficially, it seems that the construction of $U^M_G$ strongly depends on the choice of $f$. However, the dependence only marginally affects $\mu$-expectation values. This is because replacing $f$ with $g$ is equivalent to adjusting probabilities by bounded factor. The bound is roughly $2^K$ where $K$ is the Kolmogorov complexity of $f \circ g^{-1}$.

## Human Preferences and Dust Theory

Human preferences revolve around concepts which belong to an "innate" model of reality: a model which is either genetic or acquired by brains at early stages of childhood. This model describes the world mostly in terms of humans, their emotions and interactions (but might include other elements as well e.g. elements related to wild nature).

Therefore, utility functions which are good descriptions of human preferences ("friendly" utility functions) are probably of similar form to $U^M_G$ from the Game of Life example, with Game of Life replaced by the "innate human model".

Applying UDT to the $\mu$-expectation values of such utility functions leads to agents which care about anything that has a low-complexity decoding into an "innate concept" e.g. biological humans and whole brain emulations. The $\mu$-integral assigns importance to all possible "decodings" of the universe weighted by their Kolmogorov complexity which is slightly reminiscent of Egan's dust theory.

Consider an agent $P$ living in a universe I call "buttonverse". $P$ can press a button at any moment of time $t \in \mathbb{N}$$P$'s utility function $U_P$ assigns 1 to histories in which the button was pressed at least once and 0 to histories in which the button was never pressed. At each moment of time, it seems rational for $P$ to decide not to press the button since it will have the chance to do so at a later time without losing utility. As a result, if $P$ never presses the button its behavior seems rational at any particular moment but overall leads to losing. This problem (which has important ramifications for tiling agents) is known as the procrastination paradox.

My point of view on the paradox is that it is the result of a topological pathology of $U_P$. Thus, if we restrict ourselves to reasonable utility functions (in the precise sense I explain below), the paradox disappears.

Buttonverse histories are naturally described as binary sequences $\lbrace x_i \rbrace_{i \in \mathbb{N}}$ where $x_i$ is 0 when the button is not pressed at time $i$and 1 when the button is pressed at time $i$. Define $z$ to be the buttonverse history in which the button is never pressed:

$z_i=0$

Consider the following sequence of buttonverse histories: $x^i$ is the history in which the button gets pressed at time $i$ only. That is

$x^i_j=\delta_{ij}$

Now, with respect to the product topology on $X$$x^i$ converge to the $z$:

$\lim_{i \rightarrow \infty} x^i = z$

However the utilities don't behave correspondingly:

$\lim_{i \rightarrow \infty} U_P(x^i) > U_P(z)$

Therefore, it seems natural to require any utility function to be an upper semicontinuous function on X 2. I claim that this condition resolves the paradox in the precise mathematical sense considered in Yudkowsky 2013. Presenting the detailed proof would take us too far afield and is hence out of scope for this post.

## Time Discount

Bounded utility functions typically contain some kind of temporal discount. In the Game of Life example, the discount manifests as the factor $\beta^t$. It is often assumed that the discount has to take an exponential form in order to preserve time translation symmetry. However, the present formalism has no place for time translation symmetry on the fundamental level: our binary sequences have well-defined beginnings. Obviously this doesn't rule out exponential discount but the motivation for sticking to this particular form is weakened.

Note that any sequence $x$ contributes to the $\mu$-integral in [1] together with its backward translated versions $T_t(x)$:

$T_t(x)_i = x_{i+t}$

As a result, the temporal discount function effectively undergoes convolution with the function $2^{-K(t)}$ where $K(t)$ is the Kolmogorov complexity of the number $t$. As a result, whatever the form of "bare" temporal discount, the effective temporal discount falls very slowly3.

In other words, if a utility function $U$ assigns little or no importance to the distant future, a UDT agent maximizing the expectation value of $U$ would still care a lot about the distant future, because what is distant future in one universe in the ensemble is the beginning of the sequence in another universe in the ensemble.

Next in sequence: The Role of Physics in UDT, Part I

1 It might seem that there are "realities" of higher set theoretic cardinality which cannot be encoded. However, if we assume our agent's perceptions during a finite span of subjective time can be encoded as a finite number of bits, then we can safely ignore the "larger" realities. They can still exist as models the agent uses to explain its observations but it is unnecessary to assume them to exist on the "fundamental" level.

2 In particular, all computable functions are admissible since they are continuous.

3 I think that $2^{-K(t)}$ falls slower than any computable function with convergent integral.

Personal Blog

Pingbacks