Knowledge is Freedom

by Scott Garrabrant 2y9th Feb 201816 comments

63

Ω 2


Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

[Epistemic Status: Type Error]

In this post, I try to build up an ontology around the following definition of knowledge:

To know something is to have the set of policies available to you closed under conditionals dependent on that thing.

You are an agent , and you are interacting with an environment in the set of all possible environments. For each environment , you select an action from the set of available actions. You thus implement a policy . Let denote the set of policies that you could implement. (Note that is the space of functions from to .)

If you are confused about the word "could," that is okay; so am I.

A fact about the enviornment can be viewed as a function that partitions the set of environments according to that fact. For example, for the fact "the sky is blue," we can think of as the set and as the function that sends worlds with a blue sky to the element and sends worlds without a blue sky to the element . One example of a fact is which is the full specification of the environment.

A conditional policy can be formed out of other policies. To form a conditional on a fact we start with a policy for each element of . We will let denote the policy associated with , so . Given this fact and this collection of policies, we define the conditional policy given by .

Conditional policies are like if statements in programming. Using the fact "the sky is blue" from above, we can let be the policy that pushes a red button regardless of its environment and let be a policy that pushes a green button regardless of its environment. If and , then is the policy that pushes the red button if the sky is blue, and pushes big green button otherwise.

Now, we are ready to define knowledge. If is the set of policies you could implement, then you know a fact if is closed under conditional policies dependent on . (i.e. Whenever , we have .) Basically, we are just saying that your policy is allowed to break into different cases for different ways that the fact could go.

Self Reference

Now, let's consider what happens when an agent tries to know things about itself. For this, we will consider a naturalized agent, that is part of the environment. There is a fact of the environment that says what action the agent takes, where is again the set of actions available to the agent, and is a function from to that picks out what action the agent takes in that environment. Note that is exactly the agent's policy, but we are thinking about it slightly differently.

So that things are not degenerate, let's assume that there are at least two possible actions and in , and that contains the constant policies and that ignore their evironment and always ouptut the same thing.

However, we can write down an explicit policy that the agent cannot implement: the policy where the agent takes action in environments in which it takes action , and takes action in environments in which it does not take action . The agent cannot implement this policy, since there are no consistant environments in which the agent is implementing this policy. (Again, I am confused by the coulds here, but I am assuming that the agent cannot take an inherently contradictory policy.)

This policy can be viewed as a conditional policy on the fact . You can construct it as , where is the function that maps