Intelligence Metrics with Naturalized Induction using UDT

3abramdemski

1Squark

0abramdemski

0abramdemski

1Squark

0drnickbone

0Squark

0abramdemski

1Squark

0abramdemski

0Squark

0abramdemski

0V_V

2Manfred

0Squark

0Manfred

0Squark

0Manfred

0Squark

0Manfred

0Squark

1twanvl

1Squark

1twanvl

1Squark

0drnickbone

0twanvl

0TimFreeman

New Comment

28 comments, sorted by Click to highlight new comments since: Today at 7:54 PM

Thanks for posting this, quite interesting. The main question in my mind is whether your idea of stopping logical analysis short just before the logic system would prove what the system's action would be, leads to desirable properties. If I'm understanding correctly, it is intended to solve the 5-and-10 problem (I didn't look at your relevant links yet). But, it may introduce other problems.

It also does not address naturalistic self-trust. (As you observe, it doesn't seem to make any unprovable things converge to probability 1, so can't change self-trust problems related to Goedel's theorem). Do you agree, or do you see it differently?

The main question in my mind is whether your idea of stopping logical analysis short just before the logic system would prove what the system's action would be, leads to desirable properties.

Well, it leads to the right answer in simple examples. For example if A is an ambient control algorithm for the utility defined by "if A()=0 then U=u0, if A()=1 then U()=u1", for ui numbers with short binary expansion the value of the choice "0" will be u0 and the value of the choice "1" will be u1. This is because there is a simple proof of the statement "if A() = i then U=ui" whereas the actual value of A() cannot be deduced easily (with a short proof). If the ui have long/infinite binary expansion, the values will be approximations of the ui with precision depending on the depth of analysis for which A() becomes decidable.

It also does not address naturalistic self-trust.

Yes. The problem of self-trust resurfaces in my formulation as the problem of choosing F. Roughly speaking, G might not be able to use the power of F to full extent in its reasoning since F cannot prove its own soundness.

As you observe, it doesn't seem to make any unprovable things converge to probability 1

I didn't say it won't make *any* unprovable things converge to probability 1. My guess is that if a statement of the form "P(n) for all n" is provable for any given n, then its probability should converge to 1 even if it's unprovable. But I don't have a proof.

I didn't say it won't make any unprovable things converge to probability 1. My guess is that if a statement of the form "P(n) for all n" is provable for any given n, then its probability should converge to 1 even if it's unprovable. But I don't have a proof.

If a convergent probability distribution has this property, then it will also have the property of converging to probability 0 for some true sentences. To be more precise: using the language of the arithmetic hierarchy, if true Pi_1 statements converge to 1, there will be some true Pi_2 statements which converge to 0. (This was proven by Will Sawin at the MIRI workshop I attended.)

Overall, if Benja's distribution does converge, I would be surprised if it had this property-- it seems too similar to mine, which does not have this property. I could be wrong, though. My true intuition is that Benja's distribution does not converge.

Hmm. P(X) in Benja's distribution is (if I'm not mistaken) the probability that a statement will be true after we've randomly assigned true or false to all the sentences in our finite consideration set, *given* that our random assignment is consistent by our finite check. (Let's assume X is always in the set of sentences we are considering.)

Let's assume that we're expanding both our set of sentences and our consistency check, but let's also assume (for sake of informal argument) that we're expanding our consistency check fast enough that we don't have to worry about inconsistent stuff (so, we're expanding it incomputably faster than we're expanding our set of sentences).

P(X) at any given stage is the probability that X is true in a 50-50 random assignment (to the sentences in our current set), given that the assignment is consistent. In other words, it's the following ratio: the probability that an assignment including X is consistent, over the probability that any assignment is consistent.

The limit of both parts of this ratio is zero; the set of consistent assignments is a set of measure zero, since we are bound to eventually add some inconsistency. (We will definitely add in a contradiction at some point if we keep adding sentences with coin flips to determine their truth values.)

(Benja states that we're dealing with a measurable set here, at least.)

My intuition is that this won't converge, since we may infinitely often add a statement that significantly changes the ratio. I don't see that we can put a bound on how much the ratio will change at any point (whereas with my prior, there is a bound, though an incomputable one, because the measure of any remaining inconsistencies drops monotonically).

I don't see the need for 2 parameters. The way I formulated it, there is only 1 parameter: the depth of analysis D. I always consider *all* sentences. This makes sense because all but a finite set of sentences cannot figure in a contradiction of length <= D, so all but a finite set of sentences get probability 1/2 for any given D.

Regarding convergence of probabilities of undecidable statements as D -> infinity, well, I don't know how to prove it, but I also don't know how to disprove it. I can try to assign a probability to it... ;) Is the result by Sawin you mentioned published somewhere?

Is it important to the analysis whether the probabilities converge as D tends to infinity? Do you rely on this at any point?

If you need to make sure the probabilities converge, then you could consider something like the following.

First, split the sentences in your system F into "positive" sentences (the ones which have no leading "not" symbol, ~, or else which have an even number of leading "not" symbols) and "negative" sentences (the ones with an odd number of leading "not" symbols). Sort the positive sentences by length, and then sort them lexicographically within all sentences of the same length. This will give a list s1, s2 etc.

We will now build a growing subset S of sentences, and ensure that in the limit, S is consistent and complete.

At stage 0, S is empty. Call this S0.

At stage n: Toss a fair coin, and if it lands heads then add sn to S. If it lands tails then add ~sn to S. Next, systematically search through all proofs of length up to the length of sn to see if there are any inconsistencies in S. If a proof of inconsistency is found, then list the subset of positive and negative sentences which create the inconsistency e.g. {sa, ~sb, ~sc, ..., sz}; let k be the largest index in this list (the largest value of a, b, ..., z). If sk was in S, then remove it and add ~sk instead, or if ~sk was in S then replace it by sk. Restart the search for inconsistencies. When the search completes without finding any further inconsistencies we have the set Sn.

We now take the obvious limit Sw of the sets Sn as n tends to infinity: s will belong to Sw if and only if it belongs to all but a finite number of Sn. That limit Sw is guaranteed to exist, because for each m, the process will eventually find a consistent choice of sentences and negations from within {s1, ... ,sm} and then stick with it (any contradictions found later will cause the replacement of sentences sk or ~sk with k > m). Further, Sw is guaranteed to be consistent (because any contradictions would have been discovered and removed in some Sn). Finally Sw is guaranteed to be complete, since it contains either sm or ~sm for each m.

We can now define probabilities PD(s) = P(s is a member of SD) and take P(s) as the limit of PD(s) as D tends to infinity. The limit is always defined since it is just the probability that s is a member of Sw.

Ok, I see.

So, I could instead think of iteratively eliminating inconsistent sets of sentences from the measure M, and looking at the limit of M(X)/M(all). I will assume for convenience that we eliminate one inconsistency at a time (so, remove mass from one set of sentences at a time). Let's call this M{d}, so M{d+1} has one more inconsistent set removed from the measure (which is to say, set to measure 0).

Each set eliminated may take some mass away from the numerator, and will definitely take some mass away from the denominator. (Yet the numerator will always remain less than or equal to the denominator.)

If we remove a set of sentences which does not include X and has no overlap with any of the other sets of sentences removed so far, then the mass of both the numerator and the denominator get multiplied by 1 - .5^n, where n is the number of sentences in the set. To make this more general, for M{0}(Y) we can think of ~Y as having already been removed from its mass to begin; therefore, we can say that for any M{d}(Y), if d+1 is based on an inconsistent set S not involving anything removed from M{d}(Y) previously, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S|).

If S does contain a sentence which has been involved in inconsistency previously, we remove a strictly smaller fraction. If a subset of S has already been removed, then nothing gets removed at all. But if a set S being removed from M{d}(Y) contains only Y and some other sentences which have never appeared before, then M{d+1}(Y) = M{d}(Y) * (1 - .5^|S-1|).

Let's name the ratio M(X)/M(all) at depth d as R(d). Considering R(d)/R(d+1), we know its limit must be in the range [0,1], because M(X) cannot be more than M(all) (so R(d) cannot diverge upward). If the limit of R(d) is greater than zero, the limit of R(d)/R(d+1) must be 1.

Assume X and ~X are both consistent.

In some ordering of logical depth, we have that: infinitely often, we eliminate a set of the form {Z, Z->~X, X} where Z and Z->~X have both never been seen before in an elimination set. Thus, M{d+1}(X) = .75 M{d}(X).

M{d}(all) = M{d}(X) + M{d}(~X). Nothing will be removed from M{d}(~X). Since X and ~X are consistent, both have some measure. Therefore, M{d+1}(all) = .75 M{d}(X) + M{d}(~X) = M{d}(all) * ( .75 R(d) + M{d}(~X)/M{d}(all)) = M{d}(all) * ( .75 R(d) + 1 - R(d)) = M{d}(all) * (1 - .25 R(d)).

Thus, infinitely often, R(d+1) = {.75 M{d}(X)} / {M{d}(all) * (1 - .25 R(d))} = .75 R(d) / (1 - .25 R(d))

Let c be R(d+1) - R(d).

So, infinitely often, we have

R(d) + c = .75 R(d) / (1 - .25 R(d))

c = .75 R(d) / (1 - .25 R(d)) - R(d)

If c goes to zero, R(d) must go to:

0 = .75 R(d) / (1 - .25 R(d)) - R(d)

R(d) = .75 R(d) / (1 - .25 R(d))

1 = .75 / (1 - .25 R(d))

1 - .25 R(d) = .75

.25 = .25 R(d)

R(d) = 1

So we see that if the probability converges, it must go to either 0 or 1! Unless I've made some mistake.

I didn't follow but the conclusion doesn't sound right to me. Your argument looks like it should apply to any language and proof system. So if I introduce "X" as a logical symbol that isn't constrained by any axioms or inference rules, why would its probability go to either 0 or 1? It seems to converge to 1/2 since for any consistent truth assignment with X=T we have a symmetric consistent truth assignment with X=F.

To convey my argument without as much of the math:

Suppose that P(X) is 1/2 at some stage. Then there will be inconsistent sets yet to remove which will take it at least C away from 1/2, where C is a constant that does not go down as the process continues.

The intuition behind this is that removing an inconsistent sentence which has not appeared in any of the inconsistencies removed so far, reduces mass by 1/2. Thus, the mass is changing significantly, all the time. Now to make this into a proof we need to show that P(X) changes significantly no matter how far into the process we go; IE, we need to show that a significantly *different* amount of mass can be removed from the 'top' and the 'bottom' (in the fraction M(X) / M(all) at finite depth).

The inconsistency {Y, Y->X, ~X} is supposed to achieve this: it only removes mass from the bottom, but there are infinitely many sets like this (we can make Y arbitrarily complex), and each of them reduces the bottom portion by the same fraction without touching the top. Specifically, the bottom becomes 7/8ths of its size (if I've done it right), so P(x) becomes roughly .57.

The fraction can re-adjust by decreasing the top in some other way, but this doesn't allow convergence. No matter how many times the fraction reaches .5 again, there's a new Y which can be used to force it to .57.

Nice!

I would actually prefer more handwaving on the logical probability distribution. Currently it's a bit confusing to me, with what appears to be mixed notation for conditional probabilities and set comprehensions. And the specifics of how the probabilities are assigned are not necessary for later results. (For example, does it really matter that you're checking for a contradiction in proofs less than some length L, rather than checking for a contradiction in the collection of proofs generated by a general proof-generating process with resources R?) You can just say " assign some logical probability distribution that includes statements like '**A** produces **c**'."

ED(X) := Σk 2-k PD(dk(X)).

Why not just use ED(X) = ΣX X PD(X)?

QS(H) := 2-L(H) (1 - e-t(H)/t)

This distribution has the troublesome property that if there is some short hypothesis for which t(H)=1, and t is fixed, then as you get more and more data this hypothesis never goes away. In fact, if it is short enough, it can continue to dominate the true hypothesis indefinitely. Now, one can think of ad-hoc ways to fix this, but what do you think a non-ad-hoc way would look like? I feel like the goal is to *eliminate* hypotheses that have been checked and found to not fit **Y**, while still retaining hypotheses that you are logically uncertain about.

I(Q0) := EQS(Emax(U(Y(H)) | "Q(Y(H)) = Q0" is true))

I'll need some help unpacking this. I don't know how this is supposed to get its dependence on Q0, rather than just being a function of U and Y. Is the idea that QS(H) is different if use different Q(Y)? Are you eliding some updating process that was supposed to be obvious to me? As you can tell I'm pretty confused.

I would actually prefer more handwaving on the logical probability distribution. Currently it's a bit confusing to me, with what appears to be mixed notation for conditional probabilities and set comprehensions.

I'm somewhat confused as to nature of your confusion. Are you saying you don't understand the definition? Or suggesting to generalize it?

Why not just use ED(X) = ΣX X PD(X)?

Because "PD(X0)" is problematic. What is "X0"? A dyadic fraction? For a long dyadic fraction X0, the statement "X=X0" is long and therefore assigned probability 1/2. This series doesn't converge. Moreover, it seems just wrong to consider the exact equality of X to fixed dyadic fractions, especially considering that X can be provably *not* a dyadic fraction.

This distribution has the troublesome property that if there is some short hypothesis for which t(H)=1, and t is fixed, then as you get more and more data this hypothesis never goes away.

What makes you think so? As G gathers data, it will be able to eliminate hypotheses, including short ones.

I don't know how this is supposed to get its dependence on Q0, rather than just being a function of U and Y.

It gets its dependence on Q0 from the condition "Q(Y(H)) = Q0" in the inner expectation value.

As you can tell I'm pretty confused.

I'll gladly help as I can.

I'm somewhat confused as to nature of your confusion.

The main specific problem is the statement "PD(s) := P0(s | there are no contradictions of length <= D)." This doesn't make sense as conditionalization. Do you mean PD(s) = P0(s) if there are no contradictions of length<D? But even then, this doesn't define a distribution - you need to further assume that PD(s)=1 if there's a proof and PD(s)=0 if there's a proof of not-s.

Why not just use ED(X) = ΣX X PD(X)?

This series doesn't converge.

Good point. To fix this, you'd have to take advantage of information about mutual exclusivity when assigning logical probabilities. Restricting X to being less than 1 and looking at the digits is a good workaround to make it converge, though if you can't prove what any of the digits will be (e.g. if you assign P=0.5 to very long sequences with arbitrary digits) you'll just get that every digit is 50/50 1 and 0.

What makes you think so? As G gathers data, it will be able to eliminate hypotheses, including short ones.

Hmm, looks like I misunderstood. So are you assuming a certain bridging law in calculating t? That limits your ontology, though... But if you assume that you have the right bridging laws to test H, why do you need N?

It gets its dependence on Q0 from the condition "Q(Y(H)) = Q0" in the inner expectation value.

I can see that. And yet, I don't know where that matters if you go step by step calculating the answer.

The main specific problem is the statement "PD(s) := P0(s | there are no contradictions of length <= D)." This doesn't make sense as conditionalization.

It does. The probability space consists of truth assignments. P0 is a probability distribution on truth assignments. "There are no contradictions of length <= D" is a condition on truth assignments with positive P0-probability, so we can form the conditional probability distribution. You can think about it as follows: Toss a fair coin for every statement. If the resulting assignment contains contradictions of length <= D, toss all coins again. With probability 1, the process will terminate after a finite number of tosses (since there is a positive probability for it to terminate on every step).

if you can't prove what any of the digits will be (e.g. if you assign P=0.5 to very long sequences with arbitrary digits) you'll just get that every digit is 50/50 1 and 0.

Not really. If you can decide the value of a digit at the given depth of analysis D, it will be 0 or 1. If you can't it will have some probability to be 1, not necessarily 1/2. It only becomes 1/2 for digits at places > 2^D (roughly).

Hmm, looks like I misunderstood. So are you assuming a certain bridging law in calculating t?

No, t is just a "god given" constant. I don't understand the reason for your concern. The factor 1 - e^(-t(H)/t) <= 1 < infinity so it cannot lead to infinite confidence in a hypothesis.

I can see that. And yet, I don't know where that matters if you go step by step calculating the answer.

I'm computing the utility under the logical counterfactual assumption that H produces Q0. Thus if Q0 is "designed to generate utility" in a sense, the resulting expected utility will be high, otherwise low. It's just like in usual UDT.

You can think about it as follows: Toss a fair coin for every statement. If the resulting assignment contains contradictions of length <= D, toss all coins again.

Thanks! I understand your usage now, that was a good explanation.

It only becomes 1/2 for digits at places > 2^D (roughly).

If you check consistency of statements with length less than D but allow proofs of infinite length, you'll need infinite computational resources. That's bad.

No, t is just a "god given" constant.

I meant "calculating t(H)," sorry. But anyhow, I think there are several possible examples of bad behavior.

I don't understand the reason for your concern. The factor 1 - e^(-t(H)/t) <= 1 < infinity so it cannot lead to infinite confidence in a hypothesis.

Suppose that N specifies that our agent is a turing machine. It describes a universe Y with in terms of some tapes that can be read or written to. N has some initial predictions about the contents of the tapes that may or may not be accurate. Each step of Y is encoded as a big number.

Now consider some other hypothesis H which is just Y=[1,2,3,4...]. We can offset the zero time so that H and N start with the same number, so that t(H)=1. And H is much, much simpler than N. How would your agent go about showing that H is wrong?

I'm computing the utility under the logical counterfactual assumption that H produces Q0. Thus if Q0 is "designed to generate utility" in a sense, the resulting expected utility will be high,

Yay, I'm less confused now.

If you check consistency of statements with length less than D but allow proofs of infinite length, you'll need infinite computational resources. That's bad.

I don't care about proofs. Only about "D-consistency". The probability of s is the ratio of the number D-consistent truth assignments in which s is true to the total number of D-consistent truth assignments. When a short proof of s exists, all D-consistent truth assignments define s as true, so its probability is 1. When a short proof of "not s" exists, all D-consistent truth assignments define s a false, so its probability is 0. In all other cases the ratio is some number between 0 and 1, not necessarily 1/2.

Now consider some other hypothesis H which is just Y=[1,2,3,4...]. We can offset the zero time so that H and N start with the same number, so that t(H)=1. And H is much, much simpler than N. How would your agent go about showing that H is wrong?

For the agent to be functional, t has to be sufficiently large. For sufficiently large t, all hypotheses with small t(H) are suppressed, even the simplest ones. In fact, I suspect there is a certain critical t at which the agent gains the ability to accumulate knowledge over time.

I don't care about proofs. Only about "D-consistency".

Fair enough. But would you agree with the claim that a real-world agent is going to have to use a formulation that fits inside limits on the length of usable proofs? This circles back to my suggestion that the specifics aren't that important here and a handwaved generic logical probability distribution would suffice :)

For the agent to be functional, t has to be sufficiently large. For sufficiently large t, all hypotheses with small t(H) are suppressed, even the simplest ones. In fact, I suspect there is a certain critical t at which the agent gains the ability to accumulate knowledge over time.

Hm. Upon further reflection, I think the problems are not with your distribution (which I had initially misinterpreted to be a posterior distribution, with t(H) a property of the data :P ), but with the neglect of bridging laws or different ways of representing the universe.

For example, if our starting ontology is classical mechanics and our universe at each time step is just a big number encoding the coordinates of the particles, quantum mechanics has a t(H)=0, because it's such a different format. - it's the coordinates of some point in Hilbert space, not phase space. Being able to rediscover quantum mechanics is important.

If you neglect bridging laws, then your hypotheses are either untestable, or only testable using some sort of default mapping (comparing the input channel of your agent to the input channel of Q(H), which needs to be found by interpreting Q(H) using a particular format). If our agent exists in the universe in a different format, then we need to specify some different way of finding its input channel.

Another problem from the neglect of bridging laws is that when the bridging laws themselves are highly complex, you want to penalize this. You can't just have the universe be [1,2,3...] and then map those to the correct observations (using some big look-up table) and claim it's a simple hypothesis.

But would you agree with the claim that a real-world agent is going to have to use a formulation that fits inside limits on the length of usable proofs?

I'm not defining an agent here, I'm defining a mathematical function which evaluates agents. It is uncomputable (as is the Legg-Hutter metric).

Upon further reflection, I think the problems are not with your distribution... but with the neglect of bridging laws or different ways of representing the universe.

N defines the ontology in which the utility function and the "intrinsic mind model" are defined. Y should be regarded as the projection of the universe on this ontology rather than the "objective universe" (whatever the latter means). Thus H implicitly includes both the model of the universe and the bridging laws. In particular, its complexity reflects the total complexity of both. For example, if N is classical and the universe is quantum mechanical, G will arrive at a hypothesis H which combines quantum mechanics with decoherence theory to produce classical macroscopic histories. This hypothesis will have large t(H) since quantum mechanics correctly reproduces the classical dynamics of M at the macroscopic level. This shouldn't come as a surprise: we also perceive the world as classical. More precisely, there would be a dominant family of hypothesis differing in the results of "quantum coin tosses". That is, this ontological projection is precisely the place where the probability interpretation of the wavefunction arises.

P_D(s) := P_0(s | there are no contradictions of length <= D).

You have not actually defined what `P_0(a | b)`

means. The usual definition would be `P_0(a | b) = P_0(a & b) / P_0(b)`

. But then, by definition of P_0, `P_0(a & b) = 0.5`

and `P_0(b) = 0.5`

, so `P_0(a | b) = 1`

. Also, the statement "there are no contradictions of length <= D" is not even a statement in F.

I'm probably explaining it poorly in the post. P0 is *not* just a function of statements in F. P0 is a probability measure on the space of truth assignments i.e. functions {statement in F} -> {truth, false}. This probability measure is defined by making the truth value of each statement an *independent* random variable with 50/50 distribution.

PD is produced from P0 by imposing the condition "there is no contradiction of length <= D" on the truth assignment, i.e. we set the probability of all truth assignments that violate the condition to 0 and renormalize the probabilities of all other assignments. In other words P_D(s) = # {D-consistent truth assignments in which s is assigned true} / # {D-consistent truth assignments}.

Technicality: There is an infinite number of statements so there is an infinite number of truth assignments. However there is only a finite number of statements that can figure in contradictions of length <= D. Therefore all the other statements can be ignored (i.e. assumed to have independent probabilities of 1/2 like in P_0). More formally, the sigma-algebra of measurable sets on the space of truth assignments is generated by sets of the form {truth assignment T | T(s) = true} and {truth assignment T | T(s) = false}. The set of D-consistent truth assignments is in this sigma algebra and has positive probability w.r.t. our measure (as long as F is D-consistent) so we can use this set to form a conditional probability measure.

It may not be clear what you meant by "length" of contradiction. Is it the number of deductive steps to reach a contradiction, or the total number of symbols in a proof of contradiction?

Consider for instance two sentences X and ~X where X contains a billion symbols ... Is that a contradiction of length 1, or a contradiction of length about 2 billion? I think you mean about 2 billion. In which case, you will always have PD(s) = 0.5 for sentences s of length greater than D. Right?

[This comment is no longer endorsed by its author]

Followup to: Intelligence Metrics and Decision Theory

Related to: Bridge Collapse: Reductionism as Engineering Problem

A central problem in AGI is giving a formal definition of intelligence. Marcus Hutter has proposed AIXI as a model of perfectly intelligent agent. Legg and Hutter have defined a quantitative measure of intelligence applicable to any suitable formalized agent such that AIXI is the agent with maximal intelligence according to this measure.

Legg-Hutter intelligence suffers from a number of problems I have previously discussed, the most important being:

there must be a classic reference on this but I can't find it. Help?). It is straightword to tweak to formalism to allow for any utility function which depends on the agent's sensations and actions, however we would like to be able to use any ontology for defining it.toogeneral, in particular there is no Solomonoff induction or any analogue thereof, instead a completely general probability measure is used.When I started writing this I thought this formalism might be novel but now I see it is essentially the same as that of Benja.## Logical Uncertainty

The formalism introduced here was originally proposed byBenja.Fix a formal system

F. We want to be able to assign probabilities to statementssinF, taking into account limited computing resources. FixDa natural number related to the amount of computing resources that I call "depth of analysis".Define P

_{0}(s) := 1/2 for allsto be our initial prior, i.e. each statement's truth value is decided by a fair coin toss. Now defineP

_{D}(s) := P_{0}(s| there are no contradictions of length <=D).Consider

Xto be a number in [0, 1] given by a definition inF. Thend_{k}(X) := "The k-th digit of the binary expansion ofXis 1" is a statement inF. We define E_{D}(X) := Σ_{k}2^{-k}P_{D}(d_{k}(X)).## Remarks

sis provable inFthen forD>> 0, P_{D}(s) = 1. Similarly if "nots" is provable inFthen forD>> 0,P

_{D}(s) = 0.Xis decidable inFthen lim_{D -> inf}E_{D}(X) exists and equals the value ofXaccording toF.sof length >D, P_{D}(s) = 1/2 since no contradiction of length <=Dcan involves._{D -> inf}P_{D}(s) exists for anys. It seems false that this limit always existsandequals 0 or 1, i.e. this formalism isnota loophole in Goedel incompleteness. To see this consider statements that require a high (arithmetical hierarchy) order halting oracle to decide.Dcorresponds to non-deterministic spatial complexity. It is spatial since we assign truth values simultaneously to all statements so in any given contradiction it is enough to retain the "thickest" step. It is non-deterministic since it's enough for a contradiction to exists, we don't have an actual computation which produces it. I suspect this can be made more formal using the Curry-Howard isomorphism, unfortunately I don't understand the latter yet.## Non-Constructive UDT

Consider

Aa decision algorithm for optimizing utilityU, producing an output ("decision") which is an element ofC. HereUis just a constant defined inF. We define theU-value ofcinCforAat depth of analysisDto beV

_{D}(c,A;U) := E_{D}(U| "Aproducesc" is true). It is only well defined as long as "Adoesn't producec" cannot be proved at depth of analysisDi.e. P_{D}("Aproducesc") > 0. We define theabsoluteU-value ofcforAto beV(

c,A;U) := E_{D(c, A)}(U| "Aproducesc" is true) whereD(c,A) := max {D| P_{D}("Aproducesc") > 0}. Of courseD(c,A) can be infinite in which case E_{inf}(...) is understood to mean lim_{D -> inf}E_{D}(...).For example V(

c,A;U) yields the natural values forAan ambient control algorithm applied to e.g. a simple model of Newcomb's problem. To see this note that givenA's output the value ofUcan be determined at low depths of analysis whereas the output ofArequires a very high depth of analysis to determine.## Naturalized Induction

Our starting point is the "innate model"

N: a certain a priori model of the universe including the agentG. This model encodes the universe as a sequence of natural numbersY= (y_{k}) which obeys either specific deterministic or non-deterministic dynamics or at least some constraints on the possible histories. It may or may not include information on the initial conditions. For example,Ncan describe the universe as a universal Turing machineM(representingG) with special "sensory" registerse.Nconstraints the dynamics to be compatible with the rules of the Turing machine but leaves unspecified the behavior ofe. Alternatively,Ncan contain in addition toMa non-trivial model of the environment. OrNcan be a cellular automaton with the agent corresponding to a certain collection of cells.However,

G's confidence inNis limited: otherwise it wouldn't need induction. We cannot start with 0 confidence: it's impossible to program a machine if you don't have even a guess of how it works. Instead we introduce a positive real numbertwhich represents the timescale over whichNis expected to hold. We then assign to each hypothesisHaboutY(you can think about them as programs which computey_{k}giveny_{j}for j < k; more on that later) the weight QS(H) := 2^{-L(H) }(1 - e^{-t(H)/t}). Here L(H) is the length ofH's encoding in bits and t(H) is the time during whichHremains compatible withN. This is defined forNof deterministic / constraint type but can be generalized to stochasticN.The weights QS(

H) define a probability measure on the space of hypotheses which induces a probability measure on the space of historiesY. Thus we get an alternative to Solomonoff induction which allows forGto be a mechanistic part of the universe, at the price of introducingNandt.## Remarks

tis continuous.F, it is tempting to construct the hypothesis space out of predicates inFrather than programs.## Intelligence Metric

To assign intelligence to agents we need to add two ingredients:

Q: {Y} -> {bit-string} of the agentGfrom the universeY. For exampleQcan read off the program loaded intoMat time k=0.U: {Y} -> [0, 1] representingG's preferences.Uhas to be given by a definition inF. Note thatNprovides the ontology wrt whichUis defined._{QS}(U|Q), the conditional expectation value ofUfor a given value ofQin the quasi-Solomonoff measure. However, this is wrong for roughly the same reasons EDT is wrong (see previous post for details).Instead, we define I(

Q_{0}) := E_{QS}(E_{max}(U(Y(H)) | "Q(Y(H)) =Q_{0}" is true)). Here the subscript max stands for maximal depth of analysis, as in the construction of absolute UDT value above.## Remarks

Nis a highly detailed model including "me" (the programmer of the AI), this literally becomes the case. However for theoretical analysis it is likely to be more convenient to work with simpleN(also conceptually it leaves room for a "purist" notion of agent's intelligence, decoupled from the fine details of its creator).H) making the decision (Q) is not known with certainty. I think this represents a real uncertainty that has to be taken into account in decision problems in general: the decision-maker doesn't know her own algorithm. Since this "introspective uncertainty" is highly correlated with "indexical" uncertainty (uncertainty about the universe), it prevents us from absorbing the later into the utility function as proposed by Coscott.t,Gcan improve its understanding of the universe by bootstrapping the knowledge it already has. This is not possible for low values oft. In other words, if I cannot trust my mind at all, I cannot deduce anything. This leads me to an interesting conjecture: There is a a critical valuet* oftfrom which this bootstrapping becomes possible (the positive feedback look of knowledge becomes critical). I(Q) is non-smooth att* (phase transition).N. For example, ifGis realized by a machineM, we can connectMto a data storageEwhose content is left undetermined byN. We can then defineUto be defined by the formula encoded inEat time k=0. This leads to I(Q) being a sort of "general-purpose" intelligence while avoiding the problems associated with reinforcement learning.Q* maximizing I(Q) (e.g. among all programs of given length). This is not surprising, since computational cost considerations come into play. In this framework it appears to be inherently impossible to decouple the computational cost considerations:G's computations have to be realized mechanistically and therefore cannot be free of time cost and side-effects.Q* deals efficiently with problems like counterfactual mugging. The "ceteris paribus" conditional is necessary here since because of cost and side-effects of computations it is difficult to make absolute claims. However, it doesn't deal efficiently with counterfactual mugging in whichGdoesn't exist in the "other universe". This is because the ontology used for definingU(which is given byN) assumesGdoesexist. At least this is the case for simple ontologies like described above: possibly we can constructNin whichGmight or might not exist. Also, ifGuses aquantumontology (i.e.Ndescribes the universe in terms of a wavefunction andUcomputes the quantum expectation value of an operator) then itdoestake into account other Everett universes in whichGdoesn't exist.N(for example if theGis realized by a machineM), QS-induction assigns well-defined probabilities to subjective expectations, contrary to what is expected from UDT. However:N. In particular, ifNadmits destruction ofMthenM's sensations after the point of destruction are not well-defined. Indeed, we better allow for destruction ofMif we wantG's preferences to behave properly in such an event. That is, if wedon'tallow it we get a "weak anvil problem" in the sense thatGexperiences an ontological crisis when discovering its own mortality and the outcome of this crisis is not obvious. Note though that it is not the same as the original ("strong") anvil problem, for exampleGmight come to the conclusion the dynamics of "M's ghost" will be some sort of random.Nand don't amount to an elegant universal law for solving the anthropic trilemma.Nandt. This suggests we might want the updates to be minimal in some sense, in particulartshould bet*.Q* wouldn't think it is a Boltzmann brain since the long address of Boltzmann brains within the universe makes the respective hypotheses complex thus suppressing them, even disregarding the suppression associated withN. I doubt this argument is original but I feel the framework validates it to some extent.