Followup to: Intelligence Metrics and Decision Theory

Related to: Bridge Collapse: Reductionism as Engineering Problem

A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.

Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:

- The formalism is inherently Cartesian. Solving this problem is known as naturalized induction and it is discussed in detail here.
- The utility function Legg & Hutter use is a formalization of reinforcement learning, while we would like to consider agents with arbitrary preferences. Moreover, a real AGI designed with reinforcement learning would tend to wrestle control of the reinforcement signal from the operators (
*there must be a classic reference on this but I can't find it. Help?*). It is straightword to tweak to formalism to allow for any utility function which depends on the agent's sensations and actions, however we would like to be able to use any ontology for defining it.

*too*general, in particular there is no Solomonoff induction or any analogue thereof, instead a completely general probability measure is used.

- Define a formalism for logical uncertainty.
*When I started writing this I thought this formalism might be novel but now I see it is essentially the same as that of Benja.* - Use this formalism to define a non-constructive formalization of UDT. By "non-constructive" I mean something that assigns values to actions rather than a specific algorithm like here.
- Apply the formalization of UDT to my quasi-Solomonoff framework to yield an intelligence metric.
- Slightly modify my original definition of the quasi-Solomonoff measure so that the confidence of the innate model becomes a continuous rather than discrete parameter. This leads to an interesting conjecture.
- Propose a "preference agnostic" variant as an alternative to Legg & Hutter's reinforcement learning.
- Discuss certain anthropic and decision-theoretic aspects.

# Logical Uncertainty

*The formalism introduced here was originally proposed by **Benja.*

Fix a formal system **F**. We want to be able to assign probabilities to statements **s** in **F**, taking into account limited computing resources. Fix **D** a natural number related to the amount of computing resources that I call "depth of analysis".

Define P_{0}(**s**) := 1/2 for all **s** to be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now define

P_{D}(**s**) := P_{0}(**s** | there are no contradictions of length <= **D**).

Consider **X** to be a number in [0, 1] given by a definition in **F**. Then **d**_{k}(**X**) := "The k-th digit of the binary expansion of **X** is 1" is a statement in **F**. We define E_{D}(**X**) := Σ_{k} 2^{-k} P_{D}(**d**_{k}(**X**)).

## Remarks

- Clearly if
**s**is provable in**F**then for**D**>> 0, P_{D}(**s**) = 1. Similarly if "not**s**" is provable in**F**then for**D**>> 0,

P_{D}(**s**) = 0. - If each digit of
**X**is decidable in**F**then lim_{D -> inf}E_{D}(**X**) exists and equals the value of**X**according to**F**. - For
**s**of length >**D**, P_{D}(**s**) = 1/2 since no contradiction of length <=**D**can involve**s**. - It is an interesting question whether lim
_{D -> inf}P_{D}(**s**) exists for any**s**. It seems false that this limit always exists*and*equals 0 or 1, i.e. this formalism is*not*a loophole in Goedel incompleteness. To see this consider statements that require a high (arithmetical hierarchy) order halting oracle to decide. - In computational terms,
**D**corresponds to non-deterministic spatial complexity. It is spatial since we assign truth values simultaneously to all statements so in any given contradiction it is enough to retain the "thickest" step. It is non-deterministic since it's enough for a contradiction to exists, we don't have an actual computation which produces it. I suspect this can be made more formal using the Curry-Howard isomorphism, unfortunately I don't understand the latter yet.

# Non-Constructive UDT

Consider **A** a decision algorithm for optimizing utility **U**, producing an output ("decision") which is an element of **C**. Here **U** is just a constant defined in **F**. We define the **U**-value of **c** in **C** for **A** at depth of analysis **D** to be

V_{D}(**c**, **A**; **U**) := E_{D}(**U** | "**A** produces **c**" is true). It is only well defined as long as "**A** doesn't produce **c**" cannot be proved at depth of analysis **D** i.e. P_{D}("**A** produces **c**") > 0. We define the *absolute* **U**-value of **c** for **A** to be

V(**c**, **A**; **U**) := E_{D(c, A)}(**U** | "**A** produces **c**" is true) where **D**(**c**, **A**) := max {**D** | P_{D}("**A** produces **c**") > 0}. Of course **D**(**c**, **A**) can be infinite in which case E_{inf}(...) is understood to mean lim_{D -> inf} E_{D}(...).

For example V(**c**, **A**; **U**) yields the natural values for **A** an ambient control algorithm applied to e.g. a simple model of Newcomb's problem. To see this note that given **A**'s output the value of **U **can be determined at low depths of analysis whereas the output of **A** requires a very high depth of analysis to determine.

# Naturalized Induction

Our starting point is the "innate model" **N**: a certain a priori model of the universe including the agent **G**. This model encodes the universe as a sequence of natural numbers **Y** = (**y**_{k}) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example, **N** can describe the universe as a universal Turing machine **M** (representing **G**) with special "sensory" registers **e**. **N** constraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior of **e**. Alternatively, **N** can contain in addition to **M** a non-trivial model of the environment. Or **N** can be a cellular automaton with the agent corresponding to a certain collection of cells.

However, **G**'s confidence in **N** is limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real number **t** which represents the timescale over which **N** is expected to hold. We then assign to each hypothesis **H** about **Y** (you can think about them as programs which compute **y**_{k} given **y**_{j} for j < k; more on that later) the weight QS(**H**) := 2^{-L(H) }(1 - e^{-t(H)/t}). Here L(**H**) is the length of **H**'s encoding in bits and t(**H**) is the time during which **H** remains compatible with **N**. This is defined for **N** of deterministic / constraint type but can be generalized to stochastic **N**.

The weights QS(**H**) define a probability measure on the space of hypotheses which induces a probability measure on the space of histories **Y**. Thus we get an alternative to Solomonoff induction which allows for **G** to be a mechanistic part of the universe, at the price of introducing **N** and **t**.

## Remarks

- Note that time is discrete in this formalism but
**t**is continuous. - Since we're later going to use logical uncertainties wrt the formal system
**F**, it is tempting to construct the hypothesis space out of predicates in**F**rather than programs.

# Intelligence Metric

To assign intelligence to agents we need to add two ingredients:

- The decoding
**Q**: {**Y**} -> {bit-string} of the agent**G**from the universe**Y**. For example**Q**can read off the program loaded into**M**at time k=0. - A utility function
**U**: {**Y**} -> [0, 1] representing**G**'s preferences.**U**has to be given by a definition in**F**. Note that**N**provides the ontology wrt which**U**is defined.

_{QS}(

**U**|

**Q**), the conditional expectation value of

**U**for a given value of

**Q**in the quasi-Solomonoff measure. However, this is wrong for roughly the same reasons EDT is wrong (see previous post for details).

Instead, we define I(**Q**_{0}) := E_{QS}(E_{max}(**U**(**Y**(**H**)) | "**Q**(**Y**(**H**)) = **Q**_{0}" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above.

## Remarks

- IMO the correct way to look at this is intelligence metric = value of decision for the decision problem "what should I program into my robot?". If
**N**is a highly detailed model including "me" (the programmer of the AI), this literally becomes the case. However for theoretical analysis it is likely to be more convenient to work with simple**N**(also conceptually it leaves room for a "purist" notion of agent's intelligence, decoupled from the fine details of its creator).- As opposed to usual UDT, the algorithm (
**H**) making the decision (**Q**) is not known with certainty. I think this represents a real uncertainty that has to be taken into account in decision problems in general: the decision-maker doesn't know her own algorithm. Since this "introspective uncertainty" is highly correlated with "indexical" uncertainty (uncertainty about the universe), it prevents us from absorbing the later into the utility function as proposed by Coscott.

- As opposed to usual UDT, the algorithm (
- For high values of
**t**,**G**can improve its understanding of the universe by bootstrapping the knowledge it already has. This is not possible for low values of**t**. In other words, if I cannot trust my mind at all, I cannot deduce anything. This leads me to an interesting conjecture: There is a a critical value**t*** of**t**from which this bootstrapping becomes possible (the positive feedback look of knowledge becomes critical). I(**Q**) is non-smooth at**t*** (phase transition). - If we wish to understand intelligence, it might be beneficial to decouple it from the choice of preferences. To achieve this we can introduce the preference formula as an unknown parameter in
**N**. For example, if**G**is realized by a machine**M**, we can connect**M**to a data storage**E**whose content is left undetermined by**N**. We can then define**U**to be defined by the formula encoded in**E**at time k=0. This leads to I(**Q**) being a sort of "general-purpose" intelligence while avoiding the problems associated with reinforcement learning. - As opposed to Legg-Hutter intelligence, there appears to be no simple explicit description for
**Q*** maximizing I(**Q**) (e.g. among all programs of given length). This is not surprising, since computational cost considerations come into play. In this framework it appears to be inherently impossible to decouple the computational cost considerations:**G**'s computations have to be realized mechanistically and therefore cannot be free of time cost and side-effects. - Ceteris paribus,
**Q*** deals efficiently with problems like counterfactual mugging. The "ceteris paribus" conditional is necessary here since because of cost and side-effects of computations it is difficult to make absolute claims. However, it doesn't deal efficiently with counterfactual mugging in which**G**doesn't exist in the "other universe". This is because the ontology used for defining**U**(which is given by**N**) assumes**G***does*exist. At least this is the case for simple ontologies like described above: possibly we can construct**N**in which**G**might or might not exist. Also, if**G**uses a*quantum*ontology (i.e.**N**describes the universe in terms of a wavefunction and**U**computes the quantum expectation value of an operator) then it*does*take into account other Everett universes in which**G**doesn't exist. - For many choices of
**N**(for example if the**G**is realized by a machine**M**), QS-induction assigns well-defined probabilities to subjective expectations, contrary to what is expected from UDT. However:- This is not the case for all
**N**. In particular, if**N**admits destruction of**M**then**M**'s sensations after the point of destruction are not well-defined. Indeed, we better allow for destruction of**M**if we want**G**'s preferences to behave properly in such an event. That is, if we*don't*allow it we get a "weak anvil problem" in the sense that**G**experiences an ontological crisis when discovering its own mortality and the outcome of this crisis is not obvious. Note though that it is not the same as the original ("strong") anvil problem, for example**G**might come to the conclusion the dynamics of "**M**'s ghost" will be some sort of random. - These probabilities probably depend significantly on
**N**and don't amount to an elegant universal law for solving the anthropic trilemma. - Indeed this framework is not completely "updateless", it is "partially updated" by the introduction of
**N**and**t**. This suggests we might want the updates to be minimal in some sense, in particular**t**should be**t***.

- This is not the case for all
- The framework suggests there is no conceptual problem with cosmologies in which Boltzmann brains are abundant.
**Q*** wouldn't think it is a Boltzmann brain since the long address of Boltzmann brains within the universe makes the respective hypotheses complex thus suppressing them, even disregarding the suppression associated with**N**. I doubt this argument is original but I feel the framework validates it to some extent.

Thanks for posting this, quite interesting. The main question in my mind is whether your idea of stopping logical analysis short just before the logic system would prove what the system's action would be, leads to desirable properties. If I'm understanding correctly, it is intended to solve the 5-and-10 problem (I didn't look at your relevant links yet). But, it may introduce other problems.

It also does not address naturalistic self-trust. (As you observe, it doesn't seem to make any unprovable things converge to probability 1, so can't change self-trust problems related to Goedel's theorem). Do you agree, or do you see it differently?

Well, it leads to the right answer in simple examples. For example if A is an ambient control algorithm for the utility defined by "if A()=0 then U=u0, if A()=1 then U()=u1", for ui numbers with short binary expansion the value of the choice "0" will be u0 and the value of the choice "1" will be u1. This is because there is a simple proof of the statement "if A() = i then U=ui" whereas the actual value of A() cannot be deduced easily (with a short proof). If the ui have long/infinite binary expansion, the values will be approximations of the ui with precision depending on the depth of analysis for which A() becomes decidable.

Yes. The problem of self-trust resurfaces in my formulation as the problem of choosing F. Roughly speaking, G might not be able to use the power of F to full extent in its reasoning since F cannot prove its own soundness.

I didn't say it won't make

anyunprovable things converge to probability 1. My guess is that if a statement of the form "P(n) for all n" is provable for any given n, then its probability should converge to 1 even if it's unprovable. But I don't have a proof.If a convergent probability distribution has this property, then it will also have the property of converging to probability 0 for some true sentences. To be more precise: using the language of the arithmetic hierarchy, if true Pi_1 statements converge to 1, there will be some true Pi_2 statements which converge to 0. (This was proven by Will Sawin at the MIRI workshop I attended.)

Overall, if Benja's distribution does converge, I would be surprised if it had this property-- it seems too similar to mine, which does not have this property. I could be wrong, though. My true intuition is that Benja's distribution does not converge.

Hmm. P(X) in Benja's distribution is (if I'm not mistaken) the probability that a statement will be true after we've randomly assigned true or false to all the sentences in our finite consideration set,

giventhat our random assignment is consistent by our finite check. (Let's assume X is always in the set of sentences we are considering.)Let's assume that we're expanding both our set of sentences and our consistency check, but let's also assume (for sake of informal argument) that we're expanding our consistency check fast enough that we don't have to worry about inconsistent stuff (so, we're expanding it incomputably faster than we're expanding our set of sentences).

P(X) at any given stage is the probability that X is true in a 50-50 random assignment (to the sentences in our current set), given that the assignment is consistent. In other words, it's the following ratio: the probability that an assignment including X is consistent, over the probability that any assignment is consistent.

The limit of both parts of this ratio is zero; the set of consistent assignments is a set of measure zero, since we are bound to eventually add some inconsistency. (We will definitely add in a contradiction at some point if we keep adding sentences with coin flips to determine their truth values.)

(Benja states that we're dealing with a measurable set here, at least.)

My intuition is that this won't converge, since we may infinitely often add a statement that significantly changes the ratio. I don't see that we can put a bound on how much the ratio will change at any point (whereas with my prior, there is a bound, though an incomputable one, because the measure of any remaining inconsistencies drops monotonically).

I don't see the need for 2 parameters. The way I formulated it, there is only 1 parameter: the depth of analysis D. I always consider

allsentences. This makes sense because all but a finite set of sentences cannot figure in a contradiction of length <= D, so all but a finite set of sentences get probability 1/2 for any given D.Regarding convergence of probabilities of undecidable statements as D -> infinity, well, I don't know how to prove it, but I also don't know how to disprove it. I can try to assign a probability to it... ;) Is the result by Sawin you mentioned published somewhere?

Is it important to the analysis whether the probabilities converge as D tends to infinity? Do you rely on this at any point?

If you need to make sure the probabilities converge, then you could consider something like the following.

First, split the sentences in your system F into "positive" sentences (the ones which have no leading "not" symbol, ~, or else which have an even number of leading "not" symbols) and "negative" sentences (the ones with an odd number of leading "not" symbols). Sort the positive sentences by length, and then sort them lexicographically within all sentences of the same length. This will give a list s1, s2 etc.

We will now build a growing subset S of sentences, and ensure that in the limit, S is consistent and complete.

At stage 0, S is empty. Call this S0.

At stage n: Toss a fair coin, and if it lands heads then add sn to S. If it lands tails then add ~sn to S. Next, systematically search through all proofs of length up to the length of sn to see if there are any inconsistencies in S. If a proof of inconsistency is found, then list the subset of positive and negative sentences which create the inconsistency e.g. {sa, ~sb, ~sc, ..., sz}; let k be the largest index in this list (the largest value of a, b, ..., z). If sk was in S, then remove it and add ~sk instead, or if ~sk was in S then replace it by sk. Restart the search for inconsistencies. When the search completes without finding any further inconsistencies we have the set Sn.

We now take the obvious limit Sw of the sets Sn as n tends to infinity: s will belong to Sw if and only if it belongs to all but a finite number of Sn. That limit Sw is guaranteed to exist, because for each m, the process will eventually find a consistent choice of sentences and negations from within {s1, ... ,sm} and then stick with it (any contradictions found later will cause the replacement of sentences sk or ~sk with k > m). Further, Sw is guaranteed to be consistent (because any contradictions would have been discovered and removed in some Sn). Finally Sw is guaranteed to be complete, since it contains either sm or ~sm for each m.

We can now define probabilities PD(s) = P(s is a member of SD) and take P(s) as the limit of PD(s) as D tends to infinity. The limit is always defined since it is just the probability that s is a member of Sw.

Convergence is not an issue as long as the utility function is computable.

Soon I'm going to write more about logical uncertainty in the context of the Loebian obstacle.

Ok, I see.

So, I could instead think of iteratively eliminating inconsistent sets of sentences from the measure M, and looking at the limit of M(X)/M(all). I will assume for convenience that we eliminate one inconsistency at a time (so, remove mass from one set of sentences at a time). Let's call this M{d}, so M{d+1} has one more inconsistent set removed from the measure (which is to say, set to measure 0).

Each set eliminated may take some mass away from the numerator, and will definitely take some mass away from the denominator. (Yet the numerator will always remain less than or equal to the denominator.)

If we remove a set of sentences which does not include X and has no overlap with any of the other sets of sentences removed so far, then the mass of both the numerator and the denominator get multiplied by 1 - .5^n, where n is the number of sentences in the set. To make this more general, for M{0}(Y) we can think of ~Y as having already been removed from its mass to begin; therefore, we can say that for any M{d}(Y), if d+1 is based on an inconsistent set S not involving anything removed from M{d}(Y) previously, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S|).

If S does contain a sentence which has been involved in inconsistency previously, we remove a strictly smaller fraction. If a subset of S has already been removed, then nothing gets removed at all. But if a set S being removed from M{d}(Y) contains only Y and some other sentences which have never appeared before, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S-1|).

Let's name the ratio M(X)/M(all) at depth d as R(d). Considering R(d)/R(d+1), we know its limit must be in the range [0,1], because M(X) cannot be more than M(all) (so R(d) cannot diverge upward). If the limit of R(d) is greater than zero, the limit of R(d)/R(d+1) must be 1.

Assume X and ~X are both consistent.

In some ordering of logical depth, we have that: infinitely often, we eliminate a set of the form {Z, Z->~X, X} where Z and Z->~X have both never been seen before in an elimination set. Thus, M{d+1}(X) = .75 M{d}(X).

M{d}(all) = M{d}(X) + M{d}(~X). Nothing will be removed from M{d}(~X). Since X and ~X are consistent, both have some measure. Therefore, M{d+1}(all) = .75 M{d}(X) + M{d}(~X) = M{d}(all)

( .75 R(d) + M{d}(~X)/M{d}(all)) = M{d}(all)( .75 R(d) + 1 - R(d)) = M{d}(all) * (1 - .25 R(d)).Thus, infinitely often, R(d+1) = {.75 M{d}(X)} / {M{d}(all) * (1 - .25 R(d))} = .75 R(d) / (1 - .25 R(d))

Let c be R(d+1) - R(d).

So, infinitely often, we have

R(d) + c = .75 R(d) / (1 - .25 R(d))

c = .75 R(d) / (1 - .25 R(d)) - R(d)

If c goes to zero, R(d) must go to:

0 = .75 R(d) / (1 - .25 R(d)) - R(d)

R(d) = .75 R(d) / (1 - .25 R(d))

1 = .75 / (1 - .25 R(d))

1 - .25 R(d) = .75

.25 = .25 R(d)

R(d) = 1

So we see that if the probability converges, it must go to either 0 or 1! Unless I've made some mistake.

I didn't follow but the conclusion doesn't sound right to me. Your argument looks like it should apply to any language and proof system. So if I introduce "X" as a logical symbol that isn't constrained by any axioms or inference rules, why would its probability go to either 0 or 1? It seems to converge to 1/2 since for any consistent truth assignment with X=T we have a symmetric consistent truth assignment with X=F.

Yea, I suspect that's right: so if we introduce X, it must go to either zero or 1, but either option violates the symmetry between X and ~X. Therefore, it must not converge. But this needs to be formalized more before I'm sure.

I think it will converge to 1/2 because the symmetry applies at each level of depth, not just at the limit (at least approximately)

To convey my argument without as much of the math:

Suppose that P(X) is 1/2 at some stage. Then there will be inconsistent sets yet to remove which will take it at least C away from 1/2, where C is a constant that does not go down as the process continues.

The intuition behind this is that removing an inconsistent sentence which has not appeared in any of the inconsistencies removed so far, reduces mass by 1/2. Thus, the mass is changing significantly, all the time. Now to make this into a proof we need to show that P(X) changes significantly no matter how far into the process we go; IE, we need to show that a significantly

differentamount of mass can be removed from the 'top' and the 'bottom' (in the fraction M(X) / M(all) at finite depth).The inconsistency {Y, Y->X, ~X} is supposed to achieve this: it only removes mass from the bottom, but there are infinitely many sets like this (we can make Y arbitrarily complex), and each of them reduces the bottom portion by the same fraction without touching the top. Specifically, the bottom becomes 7/8ths of its size (if I've done it right), so P(x) becomes roughly .57.

The fraction can re-adjust by decreasing the top in some other way, but this doesn't allow convergence. No matter how many times the fraction reaches .5 again, there's a new Y which can be used to force it to .57.

Can the 5-and-10 problem affect agents which reason on a consistent formal system? If I understand correctly, it can't.

Nice!

I would actually prefer more handwaving on the logical probability distribution. Currently it's a bit confusing to me, with what appears to be mixed notation for conditional probabilities and set comprehensions. And the specifics of how the probabilities are assigned are not necessary for later results. (For example, does it really matter that you're checking for a contradiction in proofs less than some length L, rather than checking for a contradiction in the collection of proofs generated by a general proof-generating process with resources R?) You can just say " assign some logical probability distribution that includes statements like '

Aproducesc'."Why not just use ED(X) = ΣX X PD(X)?

This distribution has the troublesome property that if there is some short hypothesis for which t(H)=1, and t is fixed, then as you get more and more data this hypothesis never goes away. In fact, if it is short enough, it can continue to dominate the true hypothesis indefinitely. Now, one can think of ad-hoc ways to fix this, but what do you think a non-ad-hoc way would look like? I feel like the goal is to

eliminatehypotheses that have been checked and found to not fitY, while still retaining hypotheses that you are logically uncertain about.I'll need some help unpacking this. I don't know how this is supposed to get its dependence on Q0, rather than just being a function of U and Y. Is the idea that QS(H) is different if use different Q(Y)? Are you eliding some updating process that was supposed to be obvious to me? As you can tell I'm pretty confused.

I'm somewhat confused as to nature of your confusion. Are you saying you don't understand the definition? Or suggesting to generalize it?

Because "PD(X0)" is problematic. What is "X0"? A dyadic fraction? For a long dyadic fraction X0, the statement "X=X0" is long and therefore assigned probability 1/2. This series doesn't converge. Moreover, it seems just wrong to consider the exact equality of X to fixed dyadic fractions, especially considering that X can be provably

nota dyadic fraction.What makes you think so? As G gathers data, it will be able to eliminate hypotheses, including short ones.

It gets its dependence on Q0 from the condition "Q(Y(H)) = Q0" in the inner expectation value.

I'll gladly help as I can.

The main specific problem is the statement "PD(s) := P0(s | there are no contradictions of length <= D)." This doesn't make sense as conditionalization. Do you mean PD(s) = P0(s) if there are no contradictions of length<D? But even then, this doesn't define a distribution - you need to further assume that PD(s)=1 if there's a proof and PD(s)=0 if there's a proof of not-s.

Good point. To fix this, you'd have to take advantage of information about mutual exclusivity when assigning logical probabilities. Restricting X to being less than 1 and looking at the digits is a good workaround to make it converge, though if you can't prove what any of the digits will be (e.g. if you assign P=0.5 to very long sequences with arbitrary digits) you'll just get that every digit is 50/50 1 and 0.

Hmm, looks like I misunderstood. So are you assuming a certain bridging law in calculating t? That limits your ontology, though... But if you assume that you have the right bridging laws to test H, why do you need N?

I can see that. And yet, I don't know where that matters if you go step by step calculating the answer.

It does. The probability space consists of truth assignments. P0 is a probability distribution on truth assignments. "There are no contradictions of length <= D" is a condition on truth assignments with positive P0-probability, so we can form the conditional probability distribution. You can think about it as follows: Toss a fair coin for every statement. If the resulting assignment contains contradictions of length <= D, toss all coins again. With probability 1, the process will terminate after a finite number of tosses (since there is a positive probability for it to terminate on every step).

Not really. If you can decide the value of a digit at the given depth of analysis D, it will be 0 or 1. If you can't it will have some probability to be 1, not necessarily 1/2. It only becomes 1/2 for digits at places > 2^D (roughly).

No, t is just a "god given" constant. I don't understand the reason for your concern. The factor 1 - e^(-t(H)/t) <= 1 < infinity so it cannot lead to infinite confidence in a hypothesis.

I'm computing the utility under the logical counterfactual assumption that H produces Q0. Thus if Q0 is "designed to generate utility" in a sense, the resulting expected utility will be high, otherwise low. It's just like in usual UDT.

Thanks! I understand your usage now, that was a good explanation.

If you check consistency of statements with length less than D but allow proofs of infinite length, you'll need infinite computational resources. That's bad.

I meant "calculating t(H)," sorry. But anyhow, I think there are several possible examples of bad behavior.

Suppose that N specifies that our agent is a turing machine. It describes a universe Y with in terms of some tapes that can be read or written to. N has some initial predictions about the contents of the tapes that may or may not be accurate. Each step of Y is encoded as a big number.

Now consider some other hypothesis H which is just Y=[1,2,3,4...]. We can offset the zero time so that H and N start with the same number, so that t(H)=1. And H is much, much simpler than N. How would your agent go about showing that H is wrong?

Yay, I'm less confused now.

I don't care about proofs. Only about "D-consistency". The probability of s is the ratio of the number D-consistent truth assignments in which s is true to the total number of D-consistent truth assignments. When a short proof of s exists, all D-consistent truth assignments define s as true, so its probability is 1. When a short proof of "not s" exists, all D-consistent truth assignments define s a false, so its probability is 0. In all other cases the ratio is some number between 0 and 1, not necessarily 1/2.

For the agent to be functional, t has to be sufficiently large. For sufficiently large t, all hypotheses with small t(H) are suppressed, even the simplest ones. In fact, I suspect there is a certain critical t at which the agent gains the ability to accumulate knowledge over time.

Fair enough. But would you agree with the claim that a real-world agent is going to have to use a formulation that fits inside limits on the length of usable proofs? This circles back to my suggestion that the specifics aren't that important here and a handwaved generic logical probability distribution would suffice :)

Hm. Upon further reflection, I think the problems are not with your distribution (which I had initially misinterpreted to be a posterior distribution, with t(H) a property of the data :P ), but with the neglect of bridging laws or different ways of representing the universe.

For example, if our starting ontology is classical mechanics and our universe at each time step is just a big number encoding the coordinates of the particles, quantum mechanics has a t(H)=0, because it's such a different format. - it's the coordinates of some point in Hilbert space, not phase space. Being able to rediscover quantum mechanics is important.

If you neglect bridging laws, then your hypotheses are either untestable, or only testable using some sort of default mapping (comparing the input channel of your agent to the input channel of Q(H), which needs to be found by interpreting Q(H) using a particular format). If our agent exists in the universe in a different format, then we need to specify some different way of finding its input channel.

Another problem from the neglect of bridging laws is that when the bridging laws themselves are highly complex, you want to penalize this. You can't just have the universe be [1,2,3...] and then map those to the correct observations (using some big look-up table) and claim it's a simple hypothesis.

I'm not defining an agent here, I'm defining a mathematical function which evaluates agents. It is uncomputable (as is the Legg-Hutter metric).

N defines the ontology in which the utility function and the "intrinsic mind model" are defined. Y should be regarded as the projection of the universe on this ontology rather than the "objective universe" (whatever the latter means). Thus H implicitly includes both the model of the universe and the bridging laws. In particular, its complexity reflects the total complexity of both. For example, if N is classical and the universe is quantum mechanical, G will arrive at a hypothesis H which combines quantum mechanics with decoherence theory to produce classical macroscopic histories. This hypothesis will have large t(H) since quantum mechanics correctly reproduces the classical dynamics of M at the macroscopic level. This shouldn't come as a surprise: we also perceive the world as classical. More precisely, there would be a dominant family of hypothesis differing in the results of "quantum coin tosses". That is, this ontological projection is precisely the place where the probability interpretation of the wavefunction arises.

You have not actually defined what

`P_0(a | b)`

means. The usual definition would be`P_0(a | b) = P_0(a & b) / P_0(b)`

. But then, by definition of P_0,`P_0(a & b) = 0.5`

and`P_0(b) = 0.5`

, so`P_0(a | b) = 1`

. Also, the statement "there are no contradictions of length <= D" is not even a statement in F."there are no contradictions of length <= D" is not a statement in F, it is a statement about truth assignments. I'm evaluating the probability that s is assigned "true" by the random truth assignment under the condition that this truth assignment is free of short contradictions.

Right, but

`P_0(s)`

is defined for statements`s`

in`F`

. Then suddenly you talk about`P_0(s | there is no contradiction of length <= D)`

, but the thing between parentheses is not a statement in`F`

. So, what is the real definition of`P_D`

? And how would I compute it?I'm probably explaining it poorly in the post. P0 is

notjust a function of statements in F. P0 is a probability measure on the space of truth assignments i.e. functions {statement in F} -> {truth, false}. This probability measure is defined by making the truth value of each statement anindependentrandom variable with 50/50 distribution.PD is produced from P0 by imposing the condition "there is no contradiction of length <= D" on the truth assignment, i.e. we set the probability of all truth assignments that violate the condition to 0 and renormalize the probabilities of all other assignments. In other words P_D(s) = # {D-consistent truth assignments in which s is assigned true} / # {D-consistent truth assignments}.

Technicality: There is an infinite number of statements so there is an infinite number of truth assignments. However there is only a finite number of statements that can figure in contradictions of length <= D. Therefore all the other statements can be ignored (i.e. assumed to have independent probabilities of 1/2 like in P_0). More formally, the sigma-algebra of measurable sets on the space of truth assignments is generated by sets of the form {truth assignment T | T(s) = true} and {truth assignment T | T(s) = false}. The set of D-consistent truth assignments is in this sigma algebra and has positive probability w.r.t. our measure (as long as F is D-consistent) so we can use this set to form a conditional probability measure.

It may not be clear what you meant by "length" of contradiction. Is it the number of deductive steps to reach a contradiction, or the total number of symbols in a proof of contradiction?

Consider for instance two sentences X and ~X where X contains a billion symbols ... Is that a contradiction of length 1, or a contradiction of length about 2 billion? I think you mean about 2 billion. In which case, you will always have PD(s) = 0.5 for sentences s of length greater than D. Right?

Thanks, that cleared things up.

At first I thought he meant P_D(s) is defined to be 0 if there is a proof in F of s implies false of length <= D, and 1/2 otherwise, but then he says P_D(s) == 1 later for some s, so that's not right.