Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

[Epistemic Status: Type Error]

In this post, I try to build up an ontology around the following definition of knowledge:

To know something is to have the set of policies available to you closed under conditionals dependent on that thing.

You are an agent G, and you are interacting with an environment e in the set E of all possible environments. For each environment e, you select an action a from the set A of available actions. You thus implement a policy p∈AE. Let P⊆AE denote the set of policies that you could implement. (Note that AE is the space of functions from E to A.)

If you are confused about the word "could," that is okay; so am I.

A fact (F,ϕ) about the enviornment can be viewed as a function ϕ:E→F that partitions the set of environments according to that fact. For example, for the fact "the sky is blue," we can think of F as the set {⊤,⊥} and ϕ as the function that sends worlds with a blue sky to the element ⊤ and sends worlds without a blue sky to the element ⊥. One example of a fact is (E,id) which is the full specification of the environment.

A conditional policy can be formed out of other policies. To form a conditional on a fact (F,ϕ) we start with a policy for each element of F. We will let c(f) denote the policy associated with f∈F, so c:F→AE. Given this fact and this collection of policies, we define the conditional policy pc:E→A given by e↦c(ϕ(e))(e).

Conditional policies are like if statements in programming. Using the fact "the sky is blue" from above, we can let kr be the policy that pushes a red button regardless of its environment and let kg be a policy that pushes a green button regardless of its environment. If c(⊤)=kr and c(⊥)=kg, then pc is the policy that pushes the red button if the sky is blue, and pushes big green button otherwise.

Now, we are ready to define knowledge. If P is the set of policies you could implement, then you know a fact (F,ϕ) if P is closed under conditional policies dependent on F. (i.e. Whenever c:F→P, we have pc∈P.) Basically, we are just saying that your policy is allowed to break into different cases for different ways that the fact could go.

Self Reference

Now, let's consider what happens when an agent tries to know things about itself. For this, we will consider a naturalized agent, that is part of the environment. There is a fact (A,action) of the environment that says what action the agent takes, where A is again the set of actions available to the agent, and action is a function from E to A that picks out what action the agent takes in that environment. Note that action is exactly the agent's policy, but we are thinking about it slightly differently.

So that things are not degenerate, let's assume that there are at least two possible actions a and b in A, and that P contains the constant policies ka and kb that ignore their evironment and always ouptut the same thing.

However, we can write down an explicit policy that the agent cannot implement: the policy where the agent takes action b in environments in which it takes action a, and takes action a in environments in which it does not take action a. The agent cannot implement this policy, since there are no consistant environments in which the agent is implementing this policy. (Again, I am confused by the coulds here, but I am assuming that the agent cannot take an inherently contradictory policy.)

This policy can be viewed as a conditional policy on the fact (A,action). You can construct it as pc, where

[Epistemic Status: Type Error]

In this post, I try to build up an ontology around the following definition of knowledge:

To know something is to have the set of policies available to you closed under conditionals dependent on that thing.

You are an agent G, and you are interacting with an environment e in the set E of all possible environments. For each environment e, you select an action a from the set A of available actions. You thus implement a policy p∈AE. Let P⊆AE denote the set of policies that you could implement. (Note that AE is the space of functions from E to A.)

If you are confused about the word "could," that is okay; so am I.

A fact (F,ϕ) about the enviornment can be viewed as a function ϕ:E→F that partitions the set of environments according to that fact. For example, for the fact "the sky is blue," we can think of F as the set {⊤,⊥} and ϕ as the function that sends worlds with a blue sky to the element ⊤ and sends worlds without a blue sky to the element ⊥. One example of a fact is (E,id) which is the full specification of the environment.

A conditional policy can be formed out of other policies. To form a conditional on a fact (F,ϕ) we start with a policy for each element of F. We will let c(f) denote the policy associated with f∈F, so c:F→AE. Given this fact and this collection of policies, we define the conditional policy pc:E→A given by e↦c(ϕ(e))(e).

Conditional policies are like if statements in programming. Using the fact "the sky is blue" from above, we can let kr be the policy that pushes a red button regardless of its environment and let kg be a policy that pushes a green button regardless of its environment. If c(⊤)=kr and c(⊥)=kg, then pc is the policy that pushes the red button if the sky is blue, and pushes big green button otherwise.

Now, we are ready to define knowledge. If P is the set of policies you could implement, then you know a fact (F,ϕ) if P is closed under conditional policies dependent on F. (i.e. Whenever c:F→P, we have pc∈P.) Basically, we are just saying that your policy is allowed to break into different cases for different ways that the fact could go.

## Self Reference

Now, let's consider what happens when an agent tries to know things about itself. For this, we will consider a naturalized agent, that is part of the environment. There is a fact (A,action) of the environment that says what action the agent takes, where A is again the set of actions available to the agent, and action is a function from E to A that picks out what action the agent takes in that environment. Note that action is exactly the agent's policy, but we are thinking about it slightly differently.

So that things are not degenerate, let's assume that there are at least two possible actions a and b in A, and that P contains the constant policies ka and kb that ignore their evironment and always ouptut the same thing.

However, we can write down an explicit policy that the agent cannot implement: the policy where the agent takes action b in environments in which it takes action a, and takes action a in environments in which it does not take action a. The agent cannot implement this policy, since there are no consistant environments in which the agent is implementing this policy. (Again, I am confused by the coulds here, but I am assuming that the agent cannot take an inherently contradictory policy.)

This policy can be viewed as a conditional policy on the fact (A,action). You can construct it as pc, where