# Ω 5

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(I used this paper as a reference for functional decision theory (FDT), which is essentially an improved version of timeless decision theory and updateless decision theory)

This post is a reflection on decision processes that refer to themselves (call them introspective agents if you like) with a view toward investigating the problem of counterfactuals. It ruminates on the fact that logical uncertainty about the output of a decision process is often a good thing for agents.

Let's begin with a tentative definition. A deterministic agent is autonomous with respect to if cannot predict . has logical uncertainty about 's actions. This occurs, for instance, in Newcomb's problem, when the output of the predictor depends upon what thinks. We can extend this definition to stochastic worlds and say that is autonomous with respect to if is logically uncertain about the probability distribution of B's actions (check this---it might not be the right definition). Somewhat obviously, agents can be autonomous with respect to each other (humans), mutually transparent (two trivial agents) or there can be asymmetry (the Newcomb problem). An agent can also be autonomous with respect to itself, such as we are. I want to argue here that this is a good and necessary thing.

The absence of the last possibility, self-autonomy or self-transcendence, seems to be the source of problems relating to counterfactuals (the "five-and-ten" problem). Self-transparent agents, given models of the world as Turing machines, are "fixed" about their own behavior in a strange and rigid way. They embody "fixed points". Sometimes this is good---it means that they are consistent. But if an agent equipped with a logical inductor believes that it will, given a choice between a $5 and a$10 bill, always take the $5, will in fact do so self-consistently. This is insane and rigid. On the other hand, it would also be consistent if the logical inductor believed that the agent would always take the$10. Updatelessness is tricky; reflective stability gives fixed points but not automatically the right ones.

On the other hand, FDT works because it considers each of the "fixed points" (consistent possible successor-worlds) separately, and chooses the one with greatest expected utility. Can we make a logical inductor do this? We can construct all possible logical inductors (different traders etc) or sample the space in hope of finding all the fixed points, but a thorough fix (haha) is not within my sight. Fixed points, as always and unto the ages of ages, are a pain in the butt.

An FDT agent is autonomous with respect to herself. She does not know the outcome of her decision process until she has completed it. There is logical uncertainty about her decision which cannot be removed without rendering her vulnerable to psychopathic pest control workers and similarly foul characters. This very uncertainty about herself has a protective effect. How absurd it is to know what one will choose before having chosen!

It has been proposed to remedy this issue by introducing the "-chicken rule": if believes that it will make some choice (take the \$5) with greater than credence, then it must not make that choice. In this post, Scott Garrabrant showed that this approach solves the 5 and 10, but introduces other problems. Maybe it is the wrong kind of uncertainty? It is dumb, blind randomness, absolutely different from an FDT agent's thoughtful and reflective logical uncertainty about herself.

The situation with reflective oracles seems to be similar. They are reflectively self-consistent, but the fixed points they represent are not guaranteed to be good. It would be best to generate the reflective oracle that gives the Pareto optimum, but this takes extra work. Making a choice, picking an oracle, deciding between possible worlds, reduces logical uncertainty and is therefore a kind of `update' of self-knowledge.

All the above is mere intuition with some funny language, but it points to the positive and protective role of logical uncertainty in decision-making. So take the above as a nudge toward a line of research which I think could yield a rich harvest.

New Comment

1) What's the reference to psychopathic pest control workers?

2) I suspect that there's at least something imprecise in the claim that self-knowledge is harmful. Why can't we figure out a way to throw away this useless information like we can in other situations? I know I'm expressing skepticism without a solid argument, but I haven't quite figured out how to express what feels wrong about making that claim yet.

As I understand it:

UDT > FDT > CDT > EDT

Breakdown:

UDT > FDT

FDT doesn't pay in 'The Counterfactual Mugging,' because it updates, while UDT doesn't (hence 'updateless').

FDT > CDT

In Newcomb's Problem FDT one-boxes, while CDT two-boxes.

CDT > EDT

In Newcomb's Soda, or 'Smoking Lesion', CDT does not accept that (in the absence of a time machine) your actions affect the past.

Transitivity:

UDT and FDT 'agree with' CDT on Newcomb's Soda.

UDT 'agrees with' FDT on Newcomb's problem.

I suggest

1. Define acronyms e.g. FDT.

https://www.urbandictionary.com/define.php?term=FDT

2. Structure this post better. E.g The first paragraph should be an outline of what you have to say.

Thanks. I made some quick changes but it's probably not structured enough for your taste. Ah well