# 6

The formalism presented in this post turned out to be erroneous (as opposed to the formalism in the previous post). The problem is that the step in the proof of the main proposition in which the soundness schema is applied cannot be generalized to the ordinal setting since we don't know whether ακ is a successor ordinal so we can't replace it by ακ'=ακ-1. I'm not deleting this post primarily to preserve the useful discussion in the comments.

In the previous post, I formulated a variant of Benja's parametric polymorphism suitable for constructing updateless intelligence metrics. More generally, this variants admits agents which are utility maximizers (in the informal sense of trying their best to maximize a utility function, not in the formal sense of finding the absolutely optimal solution; for example they might be "meliorizers" to use the terminology of Yudkowsky and Herreshoff) rather than satisficers. The agents using this formalism labor under a certain "delusion", namely, since they believe that κ ("the number of ineffable mystery") is an actual finite number (whereas it is secretly infinite, in a sense), they think that the chain of tiling agents is effectively finite as well (since at some point agent #n in the chain will discover that κ > n is false and will fail to construct a subsequent agent of the same "epistemic strength"). The same problem exists in Weaver's intuitionistic assertability predicate formalism.

To overcome this limitation, I suggest to extend κ's semantics from natural numbers to elements of a certain recursive ordinal (at least morally; technically it is done a bit differently, see below). In Benja's original formulation this doesn't appear a valid option, since κ is interpreted as a time interval. However, in my variant κ is just an abstract parameter queries about which can be directed to a special "oracle", so there is no such limitation.

# Formalism

Fix α a recursive ordinal and αi a system of notations for ordinals smaller than α such that α0=0, the function f defined by αf(i)i+1 is recursive and the bit-valued function g(i,j) which equals 1 iff αi < αj is also recursive. We also assume that

(0) T |- For all i: g(i,f(i))=1

where T is our starting theory. We define Tκ(α) to be T extended by the symbol κ (which stands for the index of the ordinal, not the ordinal itself) and the following self-referential soundness schema:

(1) For all formulae φ: For all n, i: ((@_Tκ(α) "φ(n)" and g(i,κ)=1) -> φ(n)[κ\i])

Tκ(α) is proven to be sound analogously to Yudkowsky and Herreshoff, with transfinite induction over ordinals below α replacing the ordinary induction.

The licensing condition for agent A0 to perform action b0 is

(2) Exists n: Tκ(α) |- (g(n,κ)=1 -> (A0()=b-> G))

The property which is sufficient for agent B1 to be creatable by agent A0 is

(3) Tκ(α) |- For all b1: (B1()=b1 -> (b1=0 or Exists n: (g(n,κ)=1 and @_Tκ(α) "(g(n,f(κ))=1 and B1()[κ\f(κ)]=b1) -> G[κ\f(κ)]")))

The proof that (3) indeed enables tiling is analogous to the previous post (but requires using (0)).

# Discussion

In order for A0 to know that the tiling chain is infinite, it is enough for her to verify ακ >= ω. In order for her to know that A1 is going to know the tiling chain is infinite, she needs to verify ακ >= ω+1. In order for her to know that all agents are going to know the tiling chain is infinite, she needs to verify ακ >= 2ω. In order for her to know that all agents are going to know that, she needs to verify ακ >= 3ω et cetera.

It remains to decide which ordinal should we actually use. My intuition is that the correct ordinal is the least α with the property that α is the proof-theoretic ordinal of Tκ(α) extended by the axiom schema {g(i,κ)=1}. This seems right since the agent shouldn't get much from ακ > β for β above the proof theoretic ordinal. However, a more formal justification is probably in order.

# 6

New Comment

I suspect that it is not just my feeling that this sequence is not very accessible. The only post that generated more than a couple of replies is http://lesswrong.com/lw/jq9/intelligence_metrics_with_naturalized_induction/ . I suggest that you only post accessible summaries here and link to your blog or preprints or wherever you post all the gory details.

I commend your enthusiasm! Positive reinforcement for working on these posts. A few notes:

More generally, this variants admits agents which are utility maximizers (or "meliorizers" to use the terminology of Yudkowsky and Herreshoff) rather than satisficers.

A utility meliorizer is not a utility maximizer. A maximizer finds the optimal strategy (this is untenable). A meliorizer simply improves upon some fallback policy (and can run into a number of problems). Eventually, we're going to want something between the two: meliorizers are just the second thing on the chain of "agents that can provably tile to agents like them" (after satisficers).

they think that the chain of tiling agents is effectively finite as well (since at some point agent #n in the chain will discover that κ > n is false and will fail to construct a subsequent agent of the same "epistemic strength").

Not quite: An agent using parametric polymorphism never 'discovers' anything about kappa. A PP agent can always tile (and PP can prove this). The catch is that the agent believes that if kappa = 0 a PP agent can tile to a potentially-inconsistent agent. An agent using PP doesn't care, as in order for PP to work the agent's goals must be formulated in terms of "the world will be safe for at least kappa steps". PP agents think that they might be inconsistent after the kappath step, but they don't mind, because they don't care about the world after the kappath step.

(2) Exists n: Tκ(α) |- (g(n,κ)=1 -> (A0()=b0 -> G))

I'm not convinced that your proof carries. (Did you want the 'Exists n' to go inside the turnstile here?) I don't think that this says what you want it to say---it seems to say that the agent can take action b if there is some n such that a_n < a_k implies executing b is safe. I can trivially satisfy this by choosing an n large enough that the antecedent false. (There's always some n such that a_n >= a_k, in which case g(n, k) = 1 is false and the action criterion is trivially satisfied.)

I'm not entirely confident in this analysis, because your statements are somewhat hard to parse in this format. Do you have a latex version, perchance?

In order for A0 to know that the tiling chain is infinite, it is enough for her to verify ακ >= ω.

Alas, the tiling chain won't be infinite, by the well-foundedness of the ordinals. The chain may be arbitrarily long, but no such chain may be infinite. (More precisely, in the system that I think you're suggesting, A0 would think that ακ is some ordinal and the world will be safe for so long as its child agents can walk some arbitrarily long but finite chain of ordinals descending from ακ.)

Thus, the simple way of extending 'vanilla' PP to use ordinals doesn't grant us too much extra power. However, there are ways to extend vanilla PP with ordinals to make the system significantly more powerful. With vanilla PP, you have to slice up the agent's safety predicate such that it talks about individual timesteps. In the more powerful version you can express more complex goals (such as "X must be done eventually, but you have a while before you must choose a deadline"). There are still some goals which are difficult to express; this is closely related to the procrastination paradox. (Similar limitations appear in the consistency waterfall, which only trusts its own soundness for Pi1 goals.)

Benja has formulated such a PP. An unpolished version of the technique used is shown in Fallenstein's Monster, further details are forthcoming.

On a tangential note, it's quite exciting to see other people who are interested in getting nice properties out of tiling agents. Have you spoken to Benja at all, and/or contacted others at MIRI?

Hi So8res, thx a lot for reading the post and commenting on it!

A utility meliorizer is not a utility maximizer. A maximizer finds the optimal strategy (this is untenable). A meliorizer simply improves upon some fallback policy (and can run into a number of problems). Eventually, we're going to want something between the two: meliorizers are just the second thing on the chain of "agents that can provably tile to agents like them" (after satisficers).

I understand the difference. I just think the terminology is a bit confusing, since many people use "utility maximizer" in the informal sense of "a gadget that tries to get as much utility as it can" rather than "something that finds the absolutely optimal strategy". I think that ultimately the solution is going to be "compute expected utility for a small number of strategies and choose the best". One of the strategies will be "think more" (since the expectation value contains logical uncertainty which by definition depends on the amount of computing resources invested into evaluating it). Now, since the actual space of possible actions is large, we need to break it down in tree-like fashion so that the choice from a large set is replaced by many choices from small sets. But all of that is very tangential to the current post.

Not quite: An agent using parametric polymorphism never 'discovers' anything about kappa.

You're describing Fallenstein's original PP which I understand. My variant works differently. It allows the agent to prove kappa > n for any n. In this respect, it is similar to Weaver's formalism.

I'm not convinced that your proof carries. (Did you want the 'Exists n' to go inside the turnstile here?) I don't think that this says what you want it to say---it seems to say that the agent can take action b if there is some n such that a_n < a_k implies executing b is safe. I can trivially satisfy this by choosing an n large enough that the antecedent false. (There's always some n such that a_n >= a_k, in which case g(n, k) = 1 is false and the action criterion is trivially satisfied.)

The quantifier is in the right place. You cannot choose n large enough since T_kappa(alpha) cannot prove a_n >= a_k for any particular n. It can only prove "Exists n: a_n >= a_k" which is why the quantifier has to be outside the turnstile. The physical meaning of this licensing condition is as follows. The agent is a theorem proving machine that reasons using T_kappa(alpha). In addition, it has access to a special "oracle" which is a black-box from the agent's POV and can answer questions of the form "a_n < a_k?". The oracle secretly always answers affirmatively (although the agents is sure of the contrary). Thus the agent can use the oracle's answers and the n outside the turnstile corresponds to the largest a_n about which the oracle was asked.

Do you have a latex version, perchance?

Unfortunately I got too lazy for Latex. In part it was an irrational response to getting little feedback to my previous post. Probably I have to fix it.

Alas, the tiling chain won't be infinite, by the well-foundedness of the ordinals.

No, since there is no decreasing chain of ordinals here. Instead, we got a "twist" inserted in the oracle. After m steps it answers the question "a_n + m < a_k?" instead of answering "a_n < a_k" like the (new) agent thinks.

I'll be very glad to answer any further questions that you have!

Benja has formulated such a PP. An unpolished version of the technique used is shown in Fallenstein's Monster, further details are forthcoming.

I have to admit to my shame that I haven't read that paper. I will surely do it. Note though that my motivation was adapting the framework to work for utility expectation values instead of satificing and in particular for my updateless intelligence formalism.

On a tangential note, it's quite exciting to see other people who are interested in getting nice properties out of tiling agents. Have you spoken to Benja at all, and/or contacted others at MIRI?

Well, I mostly hoped the posts on LW will draw some attention. However, I will be very grateful for any advice how to get people in MIRI interested. Thx again!

I just think the terminology is a bit confusing, since many people use "utility maximizer" in the informal sense of "a gadget that tries to get as much utility as it can" rather than "something that finds the absolutely optimal strategy".

This is a common sentiment, but note that a meliorizer is far from a maximizer even in the non-optimal sense. A meliorizer doesn't do any intelligent navigation of the search space, it just takes the first thing it can find which is better than its fallback plan.

The quantifier is in the right place.

I'm confused about the difference between `Tκ(α) |- φ` in (2) and `@_Tκ(α) "φ"`: don't they both mean "Tκ(α) proves φ"?

Side note: to get underscores in comments, escape them `like\_this`.

I'm also confused about how this oracle cashes out. The parent agent needs to prove k > n in your formulation; what does this proof look like? If the parent agent has access to an axiom "forall n. k > n" then the system is inconsistent. If, instead, the oracle cashes out as a formula psi such that the parent can always prove psi(n) but cannot prove "forall n. psi(n)", then you've re-invented Marcello's waterfall :-)

I suspect that you're trying to say something else, but I'm not sure what it is. What specifically does it mean for a theory of first-order logic to have a "black box" that "introduces assumptions of the form k > n"? I suspect that if it doesn't cash out to something equivalent to the waterfall then there's going to be some trouble with the parent agent trusting that its child agent has access to an accurate black box.

I will be very grateful for any advice how to get people in MIRI interested.

PDFs and concise statements of the benefits ("look, I think I found a way to get a tiling agent that does X") would make it easier for us to parse. Benja and I are both quite busy right now (there's a workshop on), but I'm sure we can get in touch more directly sometime in the coming weeks.

Again, I'm excited by your progress!

This is a common sentiment, but note that a meliorizer is far from a maximizer even in the non-optimal sense. A meliorizer doesn't do any intelligent navigation of the search space, it just takes the first thing it can find which is better than its fallback plan.

Fair enough: I amended to the text to be (hopefully) less confusing.

I'm confused about the difference between Tκ(α) |- φ in (2) and @_Tκ(α) "φ": don't they both mean "Tκ(α) proves φ"?

The difference is mainly notational: the turnstile is used instead of the box "@" since this is supposed to be a statement in the meta-theory we use to discuss the formalism rather than in the theory used by the agent itself. This is the way I understood Yudkowsky and Herreschoff using that notation (see equation 2.1 for example) and I tried to stick to it. But I might be confused about something.

I'm also confused about how this oracle cashes out.

The agent is a program that interacts with the oracle. Suppose n is a number for which the oracle has affirmatively answered "alpha_kappa > alpha_n?" by the given point in the computation. Suppose further that the program has (independently) produced a formal Tκ(α)-proof of a theorem of the form "g(n,κ)=1 -> (A0()=b0 -> G)". Then the agent is licensed to perform action b0, by the construction of the agent. It doesn't literally "prove" the premise "g(n,κ)=1" at any point.

I suspect that if it doesn't cash out to something equivalent to the waterfall then there's going to be some trouble with the parent agent trusting that its child agent has access to an accurate black box.

There is no such problem, at least in the simple "A^alpha"-like formalism. Note that the goal sentence G is typically defined in terms of the sequence of actions a_m which is defined recursively by a_{m+1} = A_m(), a_m = (a_m, A_m). Here "B()" is a formal expression that depends on kappa since by definition it is the result of the computation by program B interacting with a valid "kappa-oracle".

In the more general formalism of updateless intelligence, there is no problem as long as self-modification amounts to reprogramming of the same hardware with the same "box". Validation of a completely new "box" is more difficult, but possible at least to some finite level of confidence: see discussion in the end of the previous post. The latter problem is something I don't claim to have solved to my complete satisfaction.

PDFs and concise statements of the benefits ("look, I think I found a way to get a tiling agent that does X") would make it easier for us to parse.

I apologize for asking stupid questions but I neither worked in the academia nor even defended a thesis which makes me unacquainted with the standard tools. Is it important to make a PDF or using Latex is enough? If it is the former, what is the recommended freeware way of producing PDFs under Windows?

... I'm sure we can get in touch more directly sometime in the coming weeks.

That would be great! Again, thx for taking the time to read my work and discuss it!

I recommend writelatex for writing latex; it will make it quite easy to render a pdf.

I already rendered your argument (for the non-ordinal version) in latex, both to make it easier for me to understand and so you can see the commands that are commonly used to render stuff like this. You can read it here, I've PM'd you an edit link.

this is supposed to be a statement in the meta-theory

Ah, I see. I misunderstood. You're saying that it has an infinite action condition

``````k > 0 -> (do(a) -> G)
or k > 1 -> (do(a) -> G)
or k > 2 -> (do(a) -> G)
``````

which in the meta-language we can express as (2).

I'm pretty sure that this still reduces to vanilla PP. Consider: your proof goes through if the action condition is simply `k > 0 -> (do(a) -> G)`. The child can trivially prove `k + 1 > 0`, so the parent is demanding only that the child prove `G[k\k+1]` which is fine, except the parent still doesn't trust that the k'th child is sound. An extended version of this argument can be found in the paper linked above.

(If I've misinterpreted, please correct me again. If I'm correct, please don't be discouraged---it's great to see people working on new variations.)

Thx a dozen for writing that latex and recommending writelatex!

Indeed, in the formalism from the previous post (which you very kindly latexed) the agent thinks the tiling is going to effectively end after kappa steps (i.e. that the following agents won't operate by looking for arbitrary proofs of theorems in a theory of the same power). This is exactly the problem the current post sets out to solve. By working with ordinals instead of natural numbers, the agent can prove the tiling is infinite by verifying that alpha_kappa >= omega.

Moreover, even the formalism in the previous post does not reduce to vanilla PP. This is because, as opposed to what you wrote in the document, the goal G doesn't say something like "the world is safe for kappa steps". Indeed, the proof of tiling doesn't assume anything about G! For example, assuming that the motor action a_m is a natural number, G can be "at step m, the motor action must equal m^2". There is no kappa-time-limit in sight: kappa only enters G through the oracle's role in the agents' computations. This is important when passing from goals to expected utilities since I want the agent to be able to prove non-trivial bounds on the expected utility which would be impossible if the utility only depended on time interval [0, kappa] without anything known about kappa.

Indeed, the proof of tiling doesn't assume anything about G!

Ah, sorry, I assumed we were using the original PP assumptions. In that case, I'm guessing that such an agent is vulnerable to the procrastination paradox. Unfortunately, I don't have any more time to toy with this today.

I'd love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form "there exists a time at which the button becomes pressed" where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is "press the button". Could you provide one? I'm still a bit unclear about how the system works, and I think the walkthrough would help.

OK, firstly I found an error in what I wrote, namely condition (3) was missing a piece which I erroneously thought was redundant. Fixed it in both posts.

In that case, I'm guessing that such an agent is vulnerable to the procrastination paradox.

It surely is. However, the problem goes away once we restrict to goals G such that given a time sequence {a_m} that doesn't satisfy G, there is an initial segment {a_m}_{m<N} s.t. any sequence beginning with this initial segment also doesn't satisfy G. For example, assume the "motor action space" consists of natural numbers and the goal is as follows: either a_m = m^2+1 for all m or a_m = 0 for the first m violating the previous equation. 0 serves us as the "fallback action". The agent is guaranteed to achieve any goal of this form.

I'd love to see a walkthrough (first in the previous system, then in the ordinal system) of how the proof goes through with a goal of the form "there exists a time at which the button becomes pressed" where the parent agent is attempting to construct a perfect copy of itself and where the fallback action 0 is "press the button". Could you provide one? I'm still a bit unclear about how the system works, and I think the walkthrough would help.

I'm guessing that the unclear part is how are conditions (3) and (4) satisfied. I'll use the goal I described in the previous paragraph to avoid the procrastination paradox (otherwise the tiling still happens but the goal isn't guaranteed to be really achieved). Let's start with the "finite" system.

The sequence Am is recursively defined within T_κ by A_m() = a_m = (a_m, A_{m+1}). Here a_m are the motor actions which are used to define G.

Condition (4) is T_κ | (A0() = b_0 and B1() = 0) -> G. Because of the way fallback works with this goal, we need to introduce the further assumption that b_0 = 0^2+1 = 1 where b_0 = (b_0, B0). The T_κ-proof of (4) now goes as follows. A0() = b_0 implies A1 = B1. B1() = 0 implies A1() = 0 implies a_1 = 0. A0() = b_0 also implies a_0 = b_0 = 1. a_0 = 1 and a_1 = 0 imply G by definition of G.

Condition (3) is T_κ | For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]"))). This condition is T_κ-proved by inspecting B1's code. B1 looks for T_κ-proofs of theorems of the form "κ > n-1 and B1()[κ\κ+1] = b_1 -> G[κ\κ+1]". It is licensed to act on such a proof only when the oracle affirms the stronger premise κ > n. Note that G[κ\κ+1] looks the same as G except that the oracle used in evaluating Am() is "twisted" (in a way that doesn't affect the secret real implementation of the oracle). B1 performs fallback action 0 when it fails to find any proof as above. In practice, if a_0=(0^2+1=1,B1) then B1 is able to prove it (for example by running A0; more precisely since its goal is twisted it has to run a twisted A0 but it doesn't change the result). It can then go on to prove that a suitable motor action and a suitable successor agent will lead to the goal (since B1 and G are twisted in the same way within the theorem). If B1 is an "L-zombie" i.e. it wasn't really constructed by A0 then it a priori might fail to find a suitable proof in which case it will fallback.

I'll stop at this point. If this clarification was insufficient, please point me in the direction of the problem. If this clarification was sufficient for the "finite" system but the analogical argument for the ordinal system is unclear, tell me and I will produce it.

Hi, haven't heard from you for a while.

The ordinal formalism turned out to be erroneous (see remark I added in beginning of post). Regarding the oracle version of "usual" parametric polymorphism, it is in a way closer to Marcello's waterfall than to the original parametric polymorphism: it doesn't "lose proof strength" but it is vulnerable to the procrastination paradox. On the other hand I don't think the procrastination paradox is a severe problem, it only means you need to require your utility function to be semicontinuous wrt the product topology of the set of possible universes, where the product is over time slices. Also, the oracle formalism has an advantage over Marcello's waterfall in that it doesn't rely on an arbitrary choice of unprovable formula.

Sidenote: Is a workshop planned sometime soon? It would be very interesting for me to attend and exchange ideas with other people.

Benja and I have been pretty busy writing papers and attending some non-MIRI workshops. Eventually, we'll get back around to feeling out this system, but I have not had the time to go back through your posts in depth so as to understand the motivation. It will take me some time and effort to give your system the attention it deserves, and I still haven't been able to find that time :-)

There aren't any workshops coming up in the immediate future, but did you hear about MIRIx workshops? If you set up a your own local workshop, MIRI can reimburse certain expenses and may even send a researcher to the first workshop to give tutorials and/or chat, depending on the situation.