Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

For the last few years, a large part of my research motivation has been directed at trying to save the concept of time—save it, for example, from all the weird causal loops created by decision theory problems. This post will hopefully explain why I care so much about time, and what I think needs to be fixed.

 

Why Time?

My best attempt at a short description of time is that time is causality. For example, in a Pearlian Bayes net, you draw edges from earlier nodes to later nodes. To the extent that we want to think about causality, then, we will need to understand time.

Importantly, time is the substrate in which learning and commitments take place. When agents learn, they learn over time. The passage of time is like a ritual in which opportunities are destroyed and knowledge is created. And I think that many models of learning are subtly confused, because they are based on confused notions of time.

Time is also crucial for thinking about agency. My best short-phrase definition of agency is that agency is time travel. An agent is a mechanism through which the future is able to affect the past. An agent models the future consequences of its actions, and chooses actions on the basis of those consequences. In that sense, the consequence causes the action, in spite of the fact that the action comes earlier in the standard physical sense.

 

Problem: Time is Loopy

The main thing going wrong with time is that it is “loopy.”

The primary confusing thing about Newcomb's problem is that we want to think of our decision as coming “before” the filling of the boxes, in spite of the fact that it physically comes after. This is hinting that maybe we want to understand some other "logical" time in addition to the time of physics. 

However, when we attempt to do this, we run into two problems: Firstly, we don't understand where this logical time might come from, or how to learn it, and secondly, we run into some apparent temporal loops.

I am going to set aside the first problem and focus on the second.

The easiest way to see why we run into temporal loops is to notice that it seems like physical time is at least a little bit entangled with logical time.

Imagine the point of view of someone running a physics simulation of Newcomb’s problem, and tracking all of the details of all of the atoms. From that point of view, it seems like there is a useful sense in which the filling of the boxes comes before an agent's decision to one-box or two-box. At the same time, however, those atoms compose an agent that shouldn’t make decisions as though it were helpless to change anything.

Maybe the solution here is to think of there being many different types of “before” and “after,” “cause” and “effect,” etc. For example, we could say that X is before Y from an agent-first perspective, but Y is before X from a physics-first perspective.

I think this is right, and we want to think of there as being many different systems of time (hopefully predictably interconnected). But I don't think this resolves the whole problem.

Consider a pair of FairBot agents that successfully execute a Löbian handshake to cooperate in an open-source prisoner’s dilemma. I want to say that each agent's cooperation causes the other agent's cooperation in some sense. I could say that relative to each agent the causal/temporal ordering goes a different way, but I think the loop is an important part of the structure in this case. (I also am not even sure which direction of time I would want to associate with which agent.)

We also are tempted to put loops in our time/causality for other reasons. For example, when modeling a feedback loop in a system that persists over time, we might draw structures that look a lot like a Bayes net, but are not acyclic (e.g., a POMDP). We could think of this as a projection of another system that has an extra dimension of time, but it is a useful projection nonetheless.

 

Solution: Abstraction

My main hope for recovering a coherent notion of time and unraveling these temporal loops is via abstraction. 

In the example where the agent chooses actions based on their consequences, I think that there is an abstract model of the consequences that comes causally before the choice of action, which comes before the actual physical consequences.

In Newcomb's problem, I want to say that there is an abstract model of the action that comes causally before the filling of the boxes. 

In the open source prisoners' dilemma, I want to say that there is an abstract proof of cooperation that comes causally before the actual program traces of the agents.

All of this is pointing in the same direction: We need to have coarse abstract versions of structures come at a different time than more refined versions of the same structure. Maybe when we correctly allow for different levels of description having different links in the causal chain, we can unravel all of the time loops.

 

But How?

Unfortunately, our best understanding of time is Pearlian causality, and Pearlian causality does not do great with abstraction.

Pearl has Bayes nets with a bunch of variables, but when some of those variables are coarse abstract versions of other variables, then we have to allow for determinism, since some of our variables will be deterministic functions of each other; and the best parts of Pearl do not do well with determinism. 

But the problem runs deeper than that. If we draw an arrow in the direction of the deterministic function, we will be drawing an arrow of time from the more refined version of the structure to the coarser version of that structure, which is in the opposite direction of all of our examples.

Maybe we could avoid drawing this arrow from the more refined node to the coarser node, and instead have a path from the coarser node to the refined node. But then we could just make another copy of the coarser node that is deterministically downstream of the more refined node, adding no new degrees of freedom. What is then stopping us from swapping the two copies of the coarser node?

Overall, it seems to me that Pearl is not ready for some of the nodes to be abstract versions of other nodes, which I think needs to be fixed in order to save time.

117

Ω 52

16 comments, sorted by Highlighting new comments since Today at 6:28 PM
New Comment

agency is time travel

Since time is the direction of increased entropy, this feels like it has some deep connection to the notion of agents as things that reduce entropy (only locally, obviously) to achieve their preferences: https://www.lesswrong.com/posts/Q4hLMDrFd8fbteeZ8/measuring-optimization-power (I'm not sure this is 100% on-point to what I mean, but it's the closest thing I could find.)

Since agents can only decrease entropy locally, not globally, I wonder if we could similarly say that they can "reverse the arrow of time" only in some local sense. (Omega can predict me, but I can't predict Omega. And even Omega can't predict everything, because something something second law of thermodynamics.)

Since time is the direction of increased entropy, this feels like it has some deep connection to the notion of agents as things that reduce entropy (only locally, obviously) to achieve their preferences

Reminded me of Utility Maximization = Description Length Minimization.

An agent is a mechanism through which the future is able to affect the past

I love this!

My impression is that all this time business in decision making is more of an artifact of computing solutions to constraint problems (unlike in physics, where it's actually an important concept). There is a process of computation that works with things such as propositions about the world, which are sometimes events in the physical sense, and the process goes through these events in the world in some order, often against physical time. But it's more like construction of Kleene fixpoints or some more elaborate thing like tracing statements in a control flow graph or Abstracting Abstract Machines, a particular way of solving a constraint problem that describes the situation, than anything characterising the phenomenon of finding solutions in general. Or perhaps just going up in some domain in something like Scott semantics of a computation, for whatever reason, getting more detailed information about its behavior. "The order in domains" seems like the most relevant meaning for time in decision making, which isn't a whole lot like time.

The primary confusing thing about Newcomb's problem is that we want to think of our decision as coming “before” the filling of the boxes, in spite of the fact that it physically comes after.

 

I've been puzzling in my amateurish way over Newcomb's problem a bit. They way I think the causal flow goes is:

T0: Omega accurately simulates the agent at T1-T2, and fills the boxes accordingly.

T1: The agent deliberates about whether to one-box or two-box.

T2: The agent irrevocably commits to one-boxing or two-boxing.

The agent thinks there's a paradox, because it feels like they're making a choice at T1. In fact, they are not. To Omega, their behavior is as predictable as a one-line computer program. The "agent" does not choose to one-box or two-box. They are fated to one-box or two-box.

Understood this way, I don't think the problem violates causality.

The problem that's normally focused on is that we want to think our decision is independent, and not as predictable as stated in the problem.  Once you get over that, there's no more riddle, it's just math.

Your causality diagram should start at T0: the configuration of the universe is such that there is no freedom at T2, and Omega knows enough about it to predict what will happen.  And you're correct, the problem doesn't violate causality, it violates the free-will assumption behind the common versions of CDT.

Note: it's just a thought experiment, and we need to be careful about updating on fiction.  It doesn't say anything about whether a human decision can be known by a real Omega, only that IF it could, that implies the decision isn't made when we think it is.

Your causal description is incomplete; the loopy part requires expanding T1:

T0: Omega accurately simulates the agent at T1-T2, determines that the agent will one-box, and puts money in both of the boxes. Omega's brain/processor contains a (near) copy of the part of the causal diagram at T1 and T2.

T1: The agent deliberates about whether to one-box or two-box. She draws a causal diagram on a piece of paper. It does not contain T1, because it isn't really useful for her to model her own deliberation as she deliberates. But it does contain T2, and a shallow copy of T0, including the copy of T2 inside T0.

T2: The agent irrevocably commits to one-boxing.

The loopy part is at T1. Forward arrows mean "physically causes", and backwards arrows mean "logically causes, via one part of the causal diagram being copied into another part".

An agent is a mechanism through which the future is able to affect the past. An agent models the future consequences of its actions, and chooses actions on the basis of those consequences. In that sense, the consequence causes the action, in spite of the fact that the action comes earlier in the standard physical sense.

TL:DR;

This model sounds wrong (if still useful) in a way you are probably already aware of.


The agent chooses to take an action based on its calculations and model of the consequences of said action. The agent can be wrong about future, and thus it is not the consequence (in the (real) future) which causes the action, but an error or the past version of the environment (specifically the agent's observations/records) which causes the action.

An attempt at a concrete example: imagine bots trading on the stock market. Do you expect those bots to always predict the future accurately? (Even without cosmic radiation or mundane technological failures, I expect someone to write a program that (eventually) compiles, but has a 'wrong line' or uses the wrong formula, etc. Not 'inverting the utility function bad' - unless there's a lot of traders, but maybe something analogous to' P(A or B) = P(A) + P(B) + P(A&B)'*.)

*It's supposed to be -, not +, P(A&B).

Great post, lot of food for thought, thanks!

IIRC, in "Good and Real" Gary Drescher suggests to first consider a modified version of Newcomb's Problem, where both boxes are transparent. He then goes on to propose a solution in which agent precommits to one-box before being presented with the problem in a first place. This way, as I understand, causal diagram would first feature a node where agent chooses to precommit, and it both deterministically causes their later action to one-box, and Omega's action to put $1000000 in larger box. This initial node for choosing to precommit does look like an agent's abstract model of their and Omega's actions.

Alternatively, in this paper, "Unboxing the Concepts in Newcomb’s Paradox: Causation, Prediction, Decision in Causal Knowledge Patterns", Roland Poellinger suggests to augment ordinary causal networks with new type of undirected edge, he calls an "epistemic contour". In his setup, this edge connects agent's action to select either one or two boxes and Omega's prediction. This edge is not cut when performing do(A) operation on the causal graph, but the
information is passed backwards in time, thus formalizing the notion of "prediction".

I think newcomes problem can be resolved by throwing out the idea that a decision is being made at any point. We're mixing levels or engaging in a kind of category error where the there is a chain of three deterministic events that lead to an outcome rather than a decision point that can go either way. The nature of the person receiving the box is analyzed, that determines the contents of the box, and then subsequently one or two boxes are taken as a consequence of determined nature of the Box taker not a free choice or even the contents of the boxes. There is no decision.

The essential problem is that people try to conceptualize this problem both in terms of limited knowledge subjective decision making which generates the illusion of choice and erroneously try to reconcile that with an objectively deterministic universe, Heisenberg uncertainty notwithstanding.

An additional point of subjective confusion comes from the fact that knowledge of the nature of the problem affects the state of the box taker and may change whether one or two boxes are taken. That creates the subjective experience of a decision but objectively it's simply another factor contributing to the cause effect chain.

But the problem runs deeper than that. If we draw an arrow in the direction of the deterministic function, we will be drawing an arrow of time from the more refined version of the structure to the coarser version of that structure, which is in the opposite direction of all of our examples.

 As I currently understand this after thinking about it for a bit, we are talking about the coarseness of the model from the perspective of the model in the timeframe that it is in and not the time frame that we are in. It would make sense for our predictions of the model to become more coarse with each step forward in time if we are predicting it from a certain time into the future time-space. I don't know if this makes sense but I would be grateful for a clarification!

What is then stopping us from swapping the two copies of the coarser node?

Isn't it precisely that they're playing different roles in an abstracted model of reality? Though alternatively, you can just throw more logical nodes at the problem and create a common logical cause for both.

Also, would you say what you have in mind is built out of of augmenting a collection of causal graphs with logical nodes, or do you have something incompatible in mind?

Firstly, we don't understand where this logical time might come from, or how to learn it

Okay, you can't write a sentence like that and expect me not to say that it's another manifestation of the problem of the criterion.

Yes, I realize this is not the problem you're interested in, but it's one I'm interested in, so this seems like a good opportunity to think about it anyway.

The issue seems to be that we don't have a good way to ground the order on world states (or, subjectively speaking if we want to be maximally cautious here, experience moments) since we only ever are experiencing one moment at a time and any evidence we have about previous (or future) moments is something encoded within the present moment, say as a thought. So we don't have a fully justifiable notion of what it means for one moment to come before or after another since any evidence I try to collect about it is at some level indistinguishable from the situation where I'm a Botlzmann brain that exists for only one moment and then vanishes.

Of course we can be pragmatic about it, since that's really the only option if we want to do stuff, and we certainly are, hence why we have theories of time or causality at all. So ultimately I guess I agree with you there's not much to say here about this first problem, since at some point it becomes an unresolvable question of metaphysics, and if we build a robust enough model of time then the metaphysical question is of no practical importance anyway for the level of abstraction at which we are operating.

Time is also crucial for thinking about agency. My best short-phrase definition of agency is that agency is time travel. An agent is a mechanism through which the future is able to affect the past.

Yeahssss...except that there is no need to take that literally...agents build models of future states and act on them, even though they are approximate. If agents could act on the actual future, no one would be killed in an accident, or lose on the stock market.

The primary confusing thing about Newcomb’s problem is that we want to think of our decision as coming “before” the filling of the boxes, in spite of the fact that it physically comes after.

Newcomb's problem isn't a fact...it's not an empirical problem to be solved. You should not be inferring "how time works" from it.

This is hinting that maybe we want to understand some other “logical” time in addition to the time of physics

Ok...

Maybe the solution here is to think of there being many different types of “before” and “after,” “cause” and “effect,” etc. For example, we could say that X is before Y from an agent-first perspective, but Y is before X from a physics-first perspective

If an agent is complex enough to build multiple models , or simulations, they can run some virtual time forwards or backwards, or whatever....in a virtual sense, within the simulation. The important point is that a realistic agent loses information every time they go up another level of virtuality.

Newcomb's problem isn't a fact...it's not an empirical problem to be solved. You should not be inferring "how time works" from it.

Yes, it's not empirical (currently). It's a thought experiment, which is mentioned a lot because it's counter-intuitive. Arguably 'being counter intuitive' is an indicator that it is unlikely, i.e. obtaining enough information to create such a simulation and pay the energy cost to run it, and finish the computation before the simulated person dies is hard.


If an agent is complex enough to build multiple models , or simulations, they can run some virtual time forwards or backwards, or whatever....in a virtual sense, within the simulation. The important point is that a realistic agent loses information every time they go up another level of virtuality.

What is a level of virtuality?

What is a level of virtuality?

A simulation is level 1, a simulation of a simulation is level 2, etc.

obtaining enough information to create such a simulation and pay the energy cost to run it, and finish the computation before the simulated person dies is hard

For once, computational complexity isn't the main problem. The main problem is that to mechanise Newcomb, you still need to make assumptions about time and causality...so a mechanisation of Newcomb is not going to tell you anything new about time and causality, only echo the assumptions it's based on.

But time and causality are worth explaining because we have evidence of them.