This post focuses on the origin of goal and utility, exploring how they can be derived from precedents. With utility functions, ACI can be compared with other intelligent models, such as the rational agent, active inference, and value learning approaches.
ACI states that intelligent agents are trying to behave the same way as past behaviors which are doing the right thing. In the previous chapter, we delved into what constitutes doing the right thing. This chapter will offer a more formal analysis of what it means to behave in the same way as precedents.
(The previous version has many errors, so I rewrite this chapter)
Keep Doing the Right Thing
Let us consider an agent G that has been doing the right thing. It is reasonable to anticipate that if G keeps behaving in the same way, it will likely continue doing the right things in the environments it has experienced.
In the ACI model, the sequence of actions and observations performed by agent G while doing the right thing is denoted as the precedent . G wants to (or is made to) continue doing the right thing, which means to extend a precedent sequence with new data points.
In other words, G's goal is its best future which has the highest probability of doing the right things, while the probability of doing the right things is the utility of a future.As the most possible continuation of the precedent string, a goal should resemble the precedent as closely as possible.
However, special mathematical tools are required to compare two sequences that have different lengths.
From Precedent to Goals
There is an agent G interacts with an unknown environment in time cycles k=1,2,3,...t . In cycle k, xk is the perception (input) from the environment, and yk is the action (output) of the agent.
Define agent's interaction History h<t≡x1y1x2y2...xt−1yt−1 , while the possible Worlds with history h<t isw<n≡x1y1x2y2...xt−1yt−1...xn−1yn−1 . Worlds are stratified by histories.
Let H be the set of histories, and W be the set of worlds. For any h∈H , there is a subset Wh⊂W consisting of all worlds with history h (Armstrong 2018) .
Define Judgment Functionas a function from a world or a history to 1 or 0 :
J:W∪H→{0,1}
The history of doing the right things should have J(h)=1 .
We can define a Precedentas a history of doing the right things. All its previous history was also doing the right things.
Definition 1 (Precedent). A precedent is a history h∗ ,
∀h∗<k⊆h∗J(h∗k)≡1
For an ACI agent, the precedent contains all the information we have about what is right, thus it set a standard for right things. The right future world that will become a precedent should meet this standard.
At time m, the precedent might be the whole history h<m of an agent, but also might be a part of the whole history h∗<t⊂h<m .
A goal can be defined as a future world that has the highest likelihood of becoming a precedent. In other words, a goal represents a future world where the right things are most likely to be done.
Definition 2 (Goal). At time m, given a precedent h∗<t , the goal for time n (n>m≥t )is a world w∗<n∈Wh<m⊆Wh∗<t that
∀w<n∈Wh<mP(J(w∗<n)=1|w∗<n)≥P(J(w<n)=1|w<n)
A goal should have following properties:
An agent may have multiple, possible infinite goals, because at every future time n, there will be a w∗<n . Compromising between multiple goals would be difficult.
A goal may not always represent the right future even if it can be achieved. It has the highest likelihood to be the right future, but also possibly to be proven wrong.
When a goal w∗ is achieved, the agent may or may not receive notifications that if it has become a h∗ , in other words, things are actually right or wrong. The information about right or wrong might be presented in any form, including real-time feedback and delayed notification. For example, a video game player may only be notified at the end of a round whether they have won or lost. As a universal model of intelligence, ACI determines what is right without relying on any particular mechanism, be it natural selection or artificial control.
That's why an agent can't take actions directly using goals. It's more conventional to use expected utility to describe the behavior of an agent.
However, there is a simple and intuitive theorem about goals:
Theorem 1: An agent's goal given a precedent h∗ equals the most possible continuation of the precedent sequence.
Proof is given in the appendix at the end of the post. With this theorem, the goal calculation problem turns out to be a sequence prediction task. Following Hutter's AIXI, ACI uses Solomonoff Induction as an all purpose sequence prediction tool. Solomonoff Induction considers all possible hypotheses about a sequence, and continuously updates the estimate of the probability of each hypothesis.
From Goals to Utility
People prefer thinking in goals, but working with utilities. Utility function is defined as a function from worlds to real numbers:
U:W→R
Because a goal should be assigned with the highest expected utility among all possible worlds, and a goal is defined as a future world that has the highest likelihood of becoming a precedent/doing right things, it is reasonable to define the expected utility using the likelihood of becoming a precedent/doing right things.
Definition 3 (Expected Utility). The expected utility of any possible world w<n∈Wh∗<t is its probability of doing the right thing:
Uh∗<t(w<n)≡P(J(w<n)=1|w<n)
In other words, the utility of a world is its probability of doing the right thing given a known precedent was doing the right thing.
It is easy to prove that ACI's definition of utility follows the four axioms of VNM-rationality : completeness, transitivity, continuity, and independence.
We can also define the total expected utility as value:
Definition 4 (Value) Total expected utility or value for a policy π and history h<n and precedent h∗<t⊆h<n :
where a policy πfor an agent is a map from histories to a probability distribution over actions, π:H→ΔA .
And define reward function as the difference between two total expected utilities (Armstrong 2018)
Definition 5 (Reward) Reward between two histories h<m⊂h<n for a policy π and precedent h∗<t⊆h<n is:
R(h∗<t,π,h<n,h<m)=V(h∗<t,π,h<n)−V(h∗<t,π,h<m)
Conclusion
Theorem 1 can be roughly expressed as: an agent is always trying to behave as close to the precedent as possible. However, it does not have to do this consciously, same as organisms do not have to consciously think: I am going to survive and reproduce. They can employ a combination of policies to receive max value.
FAQs
Q: OK, following the precedent might be right, but what if the agent lives in a carefree scenario, where doing everything is right?
A: If everything is right, the agent is more likely to follow simple policies than complex ones, thus the precedent is highly likely to be a simple sequence, such as continuing to taking one action or just reflexes to the environment. However, if we can find fairly complicated structures in the precedent, it is highly unlikely that the agent is in a carefree situation.
Q: With well-defined utility functions, should ACI maximize the expected utility like a rational agent or AIXI?
A: Not really. In relatively stable environments, rational agents may serve as acceptable approximations of ACI agents. However, they are likely to encounter alignment problems when faced with unforeseen scenarios:
Once the precedent receives new data points, the utility function will undergo changes, rendering it unsuitable for straightforward optimization.
Up until this point, we have been discussing ideal ACI agents endowed with unlimited computational power and storage, and are able to achieve any possible future world they want to. However, real-world agents cannot execute Solomonoff Induction due to its inherent uncomputability. Only a constrained version of ACI, known as ACItl, can be implemented on practical computers. Once an ACItl agent receives improvement in its level of performance, it will change its approximation of utility functions.
In the next post, we will demonstrate how the ACItl model provides solutions to the alignment problem.
Appendix: Build Utility Functions using Solomonoff Induction
According to Solomonoff Induction, the probability that w is the future of the precedent sequence h∗ according to all hypotheses would be:
M(w<n=h∗<n|h∗<t)=M(h∗<n)/M(h∗<t)
Where M(h∗) is a precedent's prior distribution over all possible worlds when we take all the hypotheses into account:
M(x)≡∑μ∈MRQ−H(μ)μ(x)
where μ is the semi-measure which assigns probabilities to hypotheses x, and MR is the set of all recursive semi-measures, Q is the numbers of symbols in the sequences' alphabet, and H(μ) is the length of the shortest program that computes μ (Legg 1996).
We cannot use this equation to predict the future precedent directly, because there might be more than one possible right choices , in contrast to there is only one continuation of a sequence.
Let's consider a sequence J+ which add a variable j=J(h<k) at every step k of a history or world sequence. For example:
if jn−1=J(w<n)=1 (then all js from jt to jn−1 equal to 1), w<n is a world of doing the right thing. Thus the problem of utility becomes the problem of sequence prediction, the utility of w<n is the probability of jn−1=1 :
Then we can try to prove A goal given a precedent h∗ equals the most possible continuation of the precedent sequence:
Let w′<n be one of w<n∈Wh∗<tthat has the highest probability to be the continuation of the precedent sequence, which means:
∀w<n∈Wh∗<tP(w′<n|h∗<t)≥P(w<n|h∗<t)
and because w<n∈Wh∗<t ,
P(w′<n)≥P(w<n)
And we know all the js in J+(w′<n) and J+(w<n) equal to 1, could be the output of a program of fixed length and has a fixed affect on the prior probability of a sequence, then:
This post focuses on the origin of goal and utility, exploring how they can be derived from precedents. With utility functions, ACI can be compared with other intelligent models, such as the rational agent, active inference, and value learning approaches.
ACI states that intelligent agents are trying to behave the same way as past behaviors which are doing the right thing. In the previous chapter, we delved into what constitutes doing the right thing. This chapter will offer a more formal analysis of what it means to behave in the same way as precedents.
(The previous version has many errors, so I rewrite this chapter)
Keep Doing the Right Thing
Let us consider an agent G that has been doing the right thing. It is reasonable to anticipate that if G keeps behaving in the same way, it will likely continue doing the right things in the environments it has experienced.
In the ACI model, the sequence of actions and observations performed by agent G while doing the right thing is denoted as the precedent . G wants to (or is made to) continue doing the right thing, which means to extend a precedent sequence with new data points.
In other words, G's goal is its best future which has the highest probability of doing the right things, while the probability of doing the right things is the utility of a future. As the most possible continuation of the precedent string, a goal should resemble the precedent as closely as possible.
However, special mathematical tools are required to compare two sequences that have different lengths.
From Precedent to Goals
There is an agent G interacts with an unknown environment in time cycles k=1,2,3,...t . In cycle k, xk is the perception (input) from the environment, and yk is the action (output) of the agent.
Define agent's interaction History h<t≡x1y1x2y2...xt−1yt−1 , while the possible Worlds with history h<t is w<n≡x1y1x2y2...xt−1yt−1...xn−1yn−1 . Worlds are stratified by histories.
Let H be the set of histories, and W be the set of worlds. For any h∈H , there is a subset Wh⊂W consisting of all worlds with history h (Armstrong 2018) .
Define Judgment Function as a function from a world or a history to 1 or 0 :
J:W∪H→{0,1}
The history of doing the right things should have J(h)=1 .
We can define a Precedent as a history of doing the right things. All its previous history was also doing the right things.
Definition 1 (Precedent). A precedent is a history h∗ ,
∀h∗<k⊆h∗ J(h∗k)≡1
For an ACI agent, the precedent contains all the information we have about what is right, thus it set a standard for right things. The right future world that will become a precedent should meet this standard.
At time m, the precedent might be the whole history h<m of an agent, but also might be a part of the whole history h∗<t⊂h<m .
A goal can be defined as a future world that has the highest likelihood of becoming a precedent. In other words, a goal represents a future world where the right things are most likely to be done.
Definition 2 (Goal). At time m, given a precedent h∗<t , the goal for time n (n>m≥t )is a world w∗<n∈Wh<m⊆Wh∗<t that
∀w<n∈Wh<m P(J(w∗<n)=1|w∗<n)≥P(J(w<n)=1|w<n)
A goal should have following properties:
That's why an agent can't take actions directly using goals. It's more conventional to use expected utility to describe the behavior of an agent.
However, there is a simple and intuitive theorem about goals:
Theorem 1: An agent's goal given a precedent h∗ equals the most possible continuation of the precedent sequence.
Proof is given in the appendix at the end of the post. With this theorem, the goal calculation problem turns out to be a sequence prediction task. Following Hutter's AIXI, ACI uses Solomonoff Induction as an all purpose sequence prediction tool. Solomonoff Induction considers all possible hypotheses about a sequence, and continuously updates the estimate of the probability of each hypothesis.
From Goals to Utility
People prefer thinking in goals, but working with utilities. Utility function is defined as a function from worlds to real numbers:
U:W→R
Because a goal should be assigned with the highest expected utility among all possible worlds, and a goal is defined as a future world that has the highest likelihood of becoming a precedent/doing right things, it is reasonable to define the expected utility using the likelihood of becoming a precedent/doing right things.
Definition 3 (Expected Utility). The expected utility of any possible world w<n∈Wh∗<t is its probability of doing the right thing:
Uh∗<t(w<n)≡P(J(w<n)=1|w<n)
In other words, the utility of a world is its probability of doing the right thing given a known precedent was doing the right thing.
It is easy to prove that ACI's definition of utility follows the four axioms of VNM-rationality : completeness, transitivity, continuity, and independence.
We can also define the total expected utility as value:
Definition 4 (Value) Total expected utility or value for a policy π and history h<n and precedent h∗<t⊆h<n :
V(h∗<t,π,h<n)=Eπh∗<t(h<n)=∫w∈Wh∗<tUh∗<t(w)P(w|h<n)
where a policy π for an agent is a map from histories to a probability distribution over actions, π:H→ΔA .
And define reward function as the difference between two total expected utilities (Armstrong 2018)
Definition 5 (Reward) Reward between two histories h<m⊂h<n for a policy π and precedent h∗<t⊆h<n is:
R(h∗<t,π,h<n,h<m)=V(h∗<t,π,h<n)−V(h∗<t,π,h<m)
Conclusion
Theorem 1 can be roughly expressed as: an agent is always trying to behave as close to the precedent as possible. However, it does not have to do this consciously, same as organisms do not have to consciously think: I am going to survive and reproduce. They can employ a combination of policies to receive max value.
FAQs
Q: OK, following the precedent might be right, but what if the agent lives in a carefree scenario, where doing everything is right?
A: If everything is right, the agent is more likely to follow simple policies than complex ones, thus the precedent is highly likely to be a simple sequence, such as continuing to taking one action or just reflexes to the environment. However, if we can find fairly complicated structures in the precedent, it is highly unlikely that the agent is in a carefree situation.
Q: With well-defined utility functions, should ACI maximize the expected utility like a rational agent or AIXI?
A: Not really. In relatively stable environments, rational agents may serve as acceptable approximations of ACI agents. However, they are likely to encounter alignment problems when faced with unforeseen scenarios:
In the next post, we will demonstrate how the ACItl model provides solutions to the alignment problem.
Appendix: Build Utility Functions using Solomonoff Induction
According to Solomonoff Induction, the probability that w is the future of the precedent sequence h∗ according to all hypotheses would be:
M(w<n=h∗<n|h∗<t)=M(h∗<n)/M(h∗<t)
Where M(h∗) is a precedent's prior distribution over all possible worlds when we take all the hypotheses into account:
M(x)≡∑μ∈MRQ−H(μ)μ(x)
where μ is the semi-measure which assigns probabilities to hypotheses x, and MR is the set of all recursive semi-measures, Q is the numbers of symbols in the sequences' alphabet, and H(μ) is the length of the shortest program that computes μ (Legg 1996).
We cannot use this equation to predict the future precedent directly, because there might be more than one possible right choices , in contrast to there is only one continuation of a sequence.
Let's consider a sequence J+ which add a variable j=J(h<k) at every step k of a history or world sequence. For example:
J+(h∗<t)≡x1y11x2y21...xt−1yt−11
J+(h<t)≡x1y1j1x2y2j2...xt−1yt−1jt−1
for w<n∈Wh∗<t ,
J+(w<n)≡x1y11x2y21...xt−1yt−11xtytjt...xn−1yn−1jn−1
if jn−1=J(w<n)=1 (then all js from jt to jn−1 equal to 1), w<n is a world of doing the right thing. Thus the problem of utility becomes the problem of sequence prediction, the utility of w<n is the probability of jn−1=1 :
Uh∗<t(w<n)=P(J+(w<n)∩J(w<n)=1)/P(J+(w<n))=P(J(w<n)=1|J+(w<n))
Then we can try to prove A goal given a precedent h∗ equals the most possible continuation of the precedent sequence:
Let w′<n be one of w<n∈Wh∗<t that has the highest probability to be the continuation of the precedent sequence, which means:
∀w<n∈Wh∗<t P(w′<n|h∗<t)≥P(w<n|h∗<t)
and because w<n∈Wh∗<t ,
P(w′<n)≥P(w<n)
And we know all the js in J+(w′<n) and J+(w<n) equal to 1, could be the output of a program of fixed length and has a fixed affect on the prior probability of a sequence, then:
P(J+(w<n)∩J(w<n)=1)=P(w<n)−C1
P(J+(w<n))=P(w<n)−C2
and C1>C2
Then we can have:
∀w<n∈Wh∗<t
P(J+(w′<n)∩J(w′<n)=1)/P(J+(w′<n))≥P(J+(w<n)∩J(w<n)=1)/P(J+(w<n))
which equals
Uh∗<t(w′<n)≥Uh∗<t(w<n)