Followup to: Intelligence Metrics with Naturalized Induction using UDT

In the previous post I have defined an intelligence metric solving the duality (aka naturalized induction) and ontology problems in AIXI. This model used a formalization of UDT using Benja's model of logical uncertainty. In the current post I am going to:

- Explain some problems with my previous model (
*that section can be skipped if you don't care about the previous model and only want to understand the new one*). - Formulate a new model solving these problems. Incidentally, the new model is much closer to the usual way UDT is represented. It is also based on a different model of logical uncertainty.
- Show how to define intelligence without specifying the utility function a priori.
- Since the new model requires utility functions formulated with abstract ontology i.e. well-defined on the entire Tegmark level IV multiverse. These are generally difficult to construct (i.e. the ontology problem resurfaces in a different form). I outline a method for constructing such utility functions.

# Problems with UIM 1.0

The previous model postulated that naturalized induction uses a version of Solomonoff induction updated in the direction of an innate model **N** with a temporal confidence parameter **t**. This entails several problems:

- The dependence on the parameter
**t**whose relevant value is not easy to determine. - Conceptual divergence from the UDT philosophy that we should not update
*at all*. - Difficulties with counterfactual mugging and acausal trade scenarios in which
**G**doesn't exist in the "other universe". - Once
**G**discovers even a small violation of**N**at a very early time, it loses all ground for trusting its own mind. Effectively,**G**would find itself in the position of a Boltzmann brain. This is especially dangerous when**N**over-specifies the hardware running**G**'s mind. For example assume**N**specifies**G**to be a human brain modeled on the level of quantum field theory (particle physics). If**G**discovers that in truth it is a computer simulation on the merely molecular level, it loses its epistemic footing completely.

# UIM 2.0

I now propose the following intelligence metric (the formula goes first and then I explain the notation):

**I _{U}**(

**q**) := E

_{T}[E

_{D}[E

_{L}[

**U**(

**Y**(

**D**)) |

**Q**(

**X**(

**T**)) =

**q**]] |

**N**]

**N**is the "ideal" model of the mind of the agent**G**. For example, it can be a universal Turing machine**M**with special "sensory" registers**e**whose values can change arbitrarily after each step of**M**.**N**is specified as a system of constraints on an infinite sequence of natural numbers**X**, which should be thought of as the "Platonic ideal" realization of**G**, i.e. an imagery realization which cannot be tempered with by external forces such as anvils. As we shall see, this "ideal" serves as a template for "physical" realizations of G which*are*prone to violations of**N**.**Q**is a function that decodes**G**'s code from**X**e.g. the program loaded in**M**at time 0.**q**is a particular value of this code whose (utility specific) intelligence**I**(_{U}**q**) we are evaluating.**T**is a random (as in random variable) computable hypothesis about the "physics" of**X**, i.e a program computing**X**implemented on some fixed universal computing model (e.g. universal Turing machine)**C**.**T**is distributed according to the Solomonoff measure however the expectation value in the definition of**I**(_{U}**q**) is conditional on**N**, i.e. we restrict to programs which are compatible with**N**. From the UDT standpoint,**T**is the decision algorithm itself and the uncertainty in**T**is "introspective" uncertainty i.e. the uncertainty of the putative precursor agent**PG**(the agent creating**G**e.g. an AI programmer) regarding her own decision algorithm. Note that we don't actually*need*to postulate a**PG**which is "agenty" (i.e. use for**N**a model of AI hardware together with a model of the AI programmer programming this hardware), we can be content to remain in a more abstract framework.**D**is a random computable hypothesis about the physics of**Y**, where**Y**is an infinite sequence of natural numbers representing the physical (as opposed to "ideal") universe.**D**is distributed according to the Solomonoff measure and the respective expectation value is unconditional (i.e. we use the raw Solomonoff prior for**Y**which makes the model truly updateless). In UDT terms,**D**is indexical uncertainty.**U**is a computable function from infinite sequences of natural numbers to [0, 1] representing**G**'s utility function.**L**represents logical uncertainty. It can be defined by the model explained by cousin_it here, together with my previous construction for computing logical expectation values of random variables in [0, 1]. That is, we define E_{L}(**d**_{k}) to be the probability that a random string of bits**p**encodes a proof of the sentence "**Q**(**X**(**T**)) =**q**implies that the k-th digit of**U**(**Y**(**D**)) is 1" in some prefix-free encoding of proofs*conditional*on**p**encoding the proof of either that sentence or the sentence "**Q**(**X**(**T**)) =**q**implies that the k-th digit of**U**(**Y**(**D**)) is 0". We then define

E_{L}[**U**(**Y**(**D**)) |**Q**(**X**(**T**)) =**q**] := Σ_{k}2^{-k}E_{L}(**d**_{k}). Here, the sentences and the proofs belong to some fixed formal logic**F**, e.g. Peano arthimetics or ZFC.

## Discussion

**G**'s mental architecture**N**is defined in the "ideal" universe**X**where it is inviolable. However,**G**'s utility function**U**inhabits the physical universe**Y**. This means that a highly intelligent**q**is designed so that imperfect realizations of**G**inside**Y**generate as many utilons as possible. A typical**T**is a low Kolmogorov complexity universe which contains a perfect realization of**G**.**Q**(**X**(**T**)) is**L**-correlated to the programming of imperfect realizations of**G**inside**Y**because**T**serves as an effective (approximate) model of the formation of these realizations. For abstract**N**, this means**q**is highly intelligent when a Solomonoff-random "**M**-programming process" producing**q**entails a high expected value of**U**.- Solving the Loebian obstacle requires a more sophisticated model of logical uncertainty.
*I think I can formulate such a model. I will explain it in another post after more contemplation.* - It is desirable that the encoding of proofs
**p**satisfies a universality property so that the length of the encoding can only change by an additive constant, analogically to the weak dependence of Kolmogorov complexity on**C**. It is in fact not difficult to formulate this property and show the existence of appropriate encodings. I will discuss this point in more detail in another post.

# Generic Intelligence

It seems conceptually desirable to have a notion of intelligence independent of the specifics of the utility function. Such an intelligence metric is possible to construct in a way analogical to what I've done in UIM 1.0, however it is no longer a special case of the utility-specific metric.

Assume **N** to consist of a machine **M** connected to a special storage device **E**. Assume further that at **X**-time 0, **E** contains a valid **C**-program **u** realizing a utility function **U**, but that this is the only constraint on the initial content of **E** imposed by **N**. Define

**I**(**q**) := E_{T}[E_{D}[E_{L}[**u**(**Y**(**D**); **X**(**T**)) | **Q**(**X**(**T**)) = **q**]] | **N**]

Here, **u**(**Y**(**D**); **X**(**T**)) means that we decode **u** from **X**(**T**) and evaluate it on **Y**(**D**). Thus utility depends both on the physical universe **Y** and the ideal universe **X**. This means **G** is not precisely a UDT agent but rather a "proto-agent": only when a realization of **G** reads **u** from **E** it knows which other realizations of **G** in the multiverse (the Solomonoff ensemble from which **Y **is selected) should be considered as the "same" agent UDT-wise.

Incidentally, this can be used as a formalism for reasoning about agents that don't know their utility functions. I believe this has important applications in metaethics I will discuss in another post.

# Utility Functions in the Multiverse

UIM 2.0 is a formalism that solves the diseases of UIM 1.0 at the price of losing **N** in the capacity of the ontology for utility functions. We need the utility function to be defined on the entire multiverse i.e. on any sequence of natural numbers. I will outline a way to extend "ontology-specific" utility functions to the multiverse through a simple example.

Suppose **G** is an agent that cares about universes realizing the Game of Life, its utility function **U** corresponding to e.g. some sort of glider maximization with exponential temporal discount. Fix a specific way **DC** to decode any **Y** into a history of a 2D cellular automaton with two cell states ("dead" and "alive"). Our multiversal utility function **U*** assigns **Y**s for which **DC**(**Y**) is a legal Game of Life the value **U**(**DC**(**Y**)). All other **Y**s are treated by dividing the cells into cells **O** obeying the rules of Life and cells **V** violating the rules of Life. We can then evaluate **U** on **O** only (assuming it has some sort of locality) and assign **V** utility by some other rule, e.g.:

- zero utility
- constant utility per
**V**cell with temporal discount - constant utility per unit of surface area of the boundary between
**O**and**V**with temporal discount

**U*(Y)**is then defined to be the sum of the values assigned to

**O(Y)**and

**V(Y)**.

## Discussion

- The construction of
**U***depends on the choice of**DC**. However,**U***only depends on**DC**weakly since given a hypothesis**D**which produces a Game of Life wrt some other low complexity encoding, there is a corresponding hypothesis**D'**producing a Game of Life wrt**DC**.**D'**is obtained from**D**by appending a corresponding "transcoder" and thus it is only less Solomonoff-likely than**D**by an O(1) factor. - Since the accumulation between
**O**and**V**is additive rather than e.g. multiplicative, a**U***-agent doesn't behave as if it a priori*expects*the universe the follow the rules of Life but may have strong preferences about the universe actually doing it. - This construction is reminiscent of Egan's dust theory in the sense that all possible encodings contribute. However, here they are weighted by the Solomonoff measure.

# TLDR

The intelligence of a physicalist agent is defined to be the UDT-value of the "decision" to create the agent by the process creating the agent. The process is selected randomly from a Solomonoff measure conditional on obeying the laws of the hardware on which the agent is implemented. The "decision" is made in an "ideal" universe in which the agent is Cartesian, but the utility function is evaluated on the real universe (raw Solomonoff measure). The interaction between the two "universes" is purely via logical conditional probabilities (acausal).

If we want to discuss intelligence without specifying a utility function up front, we allow the "ideal" agent to read a program describing the utility function from a special storage immediately after "booting up".

Utility functions in the Tegmark level IV multiverse are defined by specifying a "reference universe", specifying an encoding of the reference universe and extending a utility function defined on the reference universe to encodings which violate the reference laws by summing the utility of the portion of the universe which obeys the reference laws with some function of the space-time shape of the violation.