Followup to: Agents with Cartesian childhood and Physicalist adulthood
In previous posts I have defined a formalism for quantifying the general intelligence of an abstract agent (program). This formalism relies on counting proofs in a given formal system F (like in regular UDT), which makes it susceptible to the Loebian obstacle. That is, if we imagine the agent itself making decisions by looking for proofs in the same formal system F then it would be impossible to present a general proof of its trustworthiness, since no formal system can assert is own soundness. Thus the agent might fail to qualify for high intelligence ranking according to the formalism. We can assume the agent uses a weaker formal system the soundness of which is provable in F but then we still run into difficulties if we want the agent to be self-modifying (as we expect it to be). Such an agent would have to trust its descendants which means that subsequent agents use weaker and weaker formal systems until self-modification becomes impossible.
One known solution to this is Benja's parametric polymorphism. In this post I adapt parametric polymorphism to the updateless intelligence metric framework. The formal form of this union looks harmonious but it raises questions which I currently don't fully understand.
In the original parametric polymorphism, a constant κ is introduced (informally known as "the number of ineffable mystery") s.t. the agent has to prove its actions are "safe" for time period κ (i.e. have no "bad" consequences during this period). Since it's impossible to the agent to prove any upper bound on κ, effectively its actions have to be safe indefinitely. I found that for our purposes it's better to cast the formalism otherwise. Instead of κ playing the role of a "timeout", the agent is provided with an oracle that answers questions of the form "κ > n?" for any n. Contrary to what the agent thinks, the oracle provides an affirmative answer for all n.
Let's do some math. The formalism I introduce at this point is very close to Yudkowsky and Herreshoff, with some adjustments.
Consider an infinite chain of agents A0, A1, A2... where A{i+1} is created by Ai. Formally, each Ai is a program for a universal oracle machine which computes a_i = (a_i, A{i+1}). Here a_i is a "motor action". I'm ignoring sensory input for the sake of simplicity (of course sensory input will reappear when we return to intelligence metrics).
Denote T_κ the usual theory of parametric polymorphism. That is T_κ results from starting with a theory T (e.g. Peano Arithmetic), extending the language by the symbol κ (which behaves like a natural number) and adding the (self-referential) axiom schema
(1) For all formulae φ: κ > 0 -> For all n: (@_{T_κ} "φ(n)" -> φ(n)[κ\κ-1])
Here '@_S' is a formula with the semantics of "provable in theory S" and φ(n)[κ\κ-1] is φ(n) with all occurrences of κ replaced by κ-1. Thus (1) is a self-referential soundness schema saved from paradoxicality by the "counter" κ. See Yudkowsky and Herreshoff for the proof that T_κ is sound (it has 1 model for each possible value of κ).
Assume the licensing condition for A0 to perform action b_0 is
(2) Exists n: T_κ | κ > n -> (A0() = b_0 -> G)
Here S | φ means that φ is provable in S, G is a formula denoting the "goal" of the agent (as in Yudkowsky and Herreshoff, the agent is for now assumed to be a satisficer), A0() is a formula meaning "the result of the computation A0", where the oracle is interpreted as answering the question "κ > x?" (in particular the formula A0() involves κ). Such a licensing condition can be implemented in practice by making A0 a proof searcher which is allowed to use the oracle to introduce assumptions of the form κ > n. In particular, our agent employs ambient control.
We then have the following proposition, showing that A0 is able to tile indefinitely (create an infinite chain of meaningful descendants). The proposition is formulated for A1 but the rest of the chain follows by induction since the form of the licensing condition is preserved (up to a certain twist) by the tiling.
Proposition
Assume b_0 = (b_0, B1) and B1 is s.t.
(3) T_κ | For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]")))
Here 0 is some default "harmless" action, performed if no appropriate theorem is proved.
Further assume that if B1 is created and does 0, the goal is achieved (in this framework "achieving the goal" is to be thought of as "not doing anything unsafe").
(4) T_κ | (A0() = b_0 and B1() = 0) -> G
Then A0 is allowed to create B1 i.e. licensing condition (2) holds.
B1's licensing condition contains a "twist", namely it believes that when the oracle receives input n it answers the question "κ+1 > n?" rather than "κ > n" and its goal is shifted in the same fashion. Note that the goal typically depends on κ since the agent itself and thus its κ-oracle is part of the universe in which this goal is to be attained. For example the goal can be a function of the numbers (a0, a1, a2...) defined as the output of corresponding computations with κ-oracles. Thus the shift in G can also be interpreted as a shifted interpretation of the oracle by B1.
Proof
(3) trivially implies that
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and @_{T_κ} "(κ+1 > n and B1()[κ\κ+1] = b_1) -> G[κ\κ+1]"))))
Using the premise κ > 0, we can apply the soundness schema (1). We obtain:
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (κ > n and ((κ > n and B1() = b_1) -> G)))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or Exists n: (B1() = b_1 -> G))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or (B1() = b_1 -> G))))
T_κ | κ > 0 -> (For all b_1: (B1() = b_1 -> (b_1 = 0 or G)))
This trivially implies that
T_κ | κ > 0 -> (For all b_1: ((A0() = b_0 and B1() = b_1) -> (b_1 = 0 or G)))
Applying (4) we finally get (2) for n=0
To apply parametric polymorphism to the updateless intelligence formalism, we need to do the following: