Pitfalls of Building UDT Agents

[-]Wei Dai3mo105

A lot of things you state here with apparent certainty, e.g., "We only care about this universe." are things that I think are potential problems, but am unsure about. E.g. in UDT shows that decision theory is more puzzling than ever I wrote:

Indexical values are not reflectively consistent. UDT "solves" this problem by implicitly assuming (via the type signature of its utility function) that the agent doesn't have indexical values. But humans seemingly do have indexical values, so what to do about that?

which I think is talking about the same or related issue. I think a lot of these (e.g. whether or not we really care or should care only about this universe) seem like hard philosophical problems that can't be solved easily, so directly trying to solve them, or confidently assuming some solution like "We only care about this universe", as part of AI safety/alignment seems like a bad idea to me.

[-]Cole Wyeth3mo41

This is exactly what I wanted to discuss with you - it seems we have different intuitions about the significance of ensembles. I realize that what I am saying here is not a priori obvious - it is a longer discussion. This is why I suggest it could be a dialogue, or maybe we can just chat about it informally.

[-]Steven Byrnes3mo60

Sorry if this is off-topic or you’ve already seen it, but I found Paul Christiano’s Decision theory and dynamic inconsistency to be a clarifying read.

[-]Vladimir_Nesov3mo*4-1

UDT is drawing attention to issues with how algorithms influence each other, how we should reason with uncertain knowledge about such influence, and how decisions under that uncertainty should be made by those algorithms. After an agent updates, these problems don't go away, so being "updateless" is less central to the point of UDT than all the rest, even if a lot of discussion of UDT and proposed solutions to decision problems involve an unusual amount of not-updating.

For example, consider an outcome W = C(A(O())), where W is an algorithm/term that's a composition of continuation C, agent A, and observation O (let's say it's also given as an algorithm, but A directly observes only the value it computes). When A wants to reason about how to influence W, it needs to know something about C, even though it doesn't even observe its value. It's not obvious what about C should interest A, its value C(-) as a function isn't necessarily relevant for A's decisions if C has other instances of A as its parts (for example as subterms within C itself rather than only of C(A(O()))). Now C and O seem to play a similar role in connecting A to W, the only difference is that A gets to observe the value of O (in some not obviously relevant sense, once A "becomes" the composition A(O())). So similarly A might need to know something about O that is not just its value, even "prior" to observing its value (when A is just A itself rather than A(O()), especially if O has other instances of A as its parts. The Absent-Minded Driver problem illustrates this, where one instance of the agent has the other instance in its continuation, while that other instance has the first instance of the agent in its observation-as-algorithm.

It makes sense that A has already updated on some knowledge about C and O, even if that knowledge doesn't include an already-computed value of O. For example A might already know some of the code in C and O, or facts about their code, which is often assumed in decision problems. So agents are already not perfectly undateless, in the sense that they already know the decision problem, which can involve knowing something about observation-as-algorithm.

Updating on observations seems to ask about how A(O()) should behave, as opposed to how A(-) should behave, in order to influence the value of W. But A(O()) still has the same problems with C as A(-) did (for example C could have other instances of A(O()) as its parts), it only got rid of A's problems with O, and the problems with C seem largely analogous to the problems with O (considered as an algorithm), so it's not even a crucial change.

[-]Cole Wyeth3mo30

I'm reasoning about updatelessness because I've recently been investigating an updateful theory of embedded agency, not because I think it's the only embeddedness problem.

^{^}

I'd like to do a dialogue with @Wei Dai on this point.

LESSWRONG
LW

LESSWRONG
LW

26

Pitfalls of Building UDT Agents

26

Ω 9

26

Ω 9

A Compressed Summary of the Controversy on Updating

A Rejection of Premature Tiling

Tiling Concerns

User Manual