Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Against Time in Agent Models

27Vladimir_Nesov

5abramdemski

2Vladimir_Nesov

8Gordon Seidoh Worley

7AlexMennen

2Vladimir_Nesov

6Gurkenglas

2johnswentworth

2Gurkenglas

4Donald Hobson

3Leon Lang

3Ramana Kumar

3Maxwell Clarke

New Comment

I like specialization preorder as a setting for formulating these concepts. In a topological space, point y is *stronger* (more specialized) than point x iff all opens containing x also contain y. If opens are thought of as propositions, and specialization order as a kind of ("logical") time, with stronger points being in the future of weaker points, then this says that propositions must be valid with respect to time (that is, we want to only allow propositions that don't get invalidated). This setting motivates thinking of points not as objects of study, but as partial observations of objects of study, their shadows that develop according to specialization preorder. If a proposition is true about some partial observation of an object (a point of the space), it remains true when it develops further (in the future, for stronger points). The best we can capture objects of study is with neighborhood filters, but the conceptual distinction suggests that even in a sober space the objects of study are not necessarily points, they are merely observed through points.

This is just what Scott domains or more generally algebraic dcpos with Scott topology talk about, when we start with a poset of finite observations (about computations, the elusive objects of study), which is the specialization preorder of its Alexandrov topology, which then becomes Scott topology after soberification, adding points for partial observations that can be expressed in terms of Alexandrov opens on finite observations. Specialization order follows a computation, and opens formulate semidecidable properties. There are two different ways in which a computation is approximated: with a weaker observation/point, and with a weaker specification/proposition/open. One nice thing here is that we can recover points from opens, and then finite observations from the specialization poset of all partial observations/theories/ideals (as compact elements of a poset). So the different concepts fit together well, the rhetoric of observations and logical time has technical content that can be extracted just from the opens/propositions. This becomes even more concrete for coherent spaces, where finite observations are finite cliques on webs (of "atomic observations").

(This is mostly a keyword dump, pointing to standard theory that offers a way of making sense of logical time. The interesting questions are about how to make use of this to formulate FDT and avoid spurious proofs, possibly by reasoning at a particular point of a space, the logical moment of decision, without making use of its future. A distinction this point of view enforces that is usually missing in discussions of decision theory or reasoning about programs is between approximation with weaker observations vs. with weaker propositions. This is the distinction between different logical times and different states of knowledge about a computation.)

If opens are thought of as propositions, and specialization order as a kind of ("logical") time,

Up to here made sense.

with stronger points being in the future of weaker points, then this says that propositions must be valid with respect to time (that is, we want to only allow propositions that don't get invalidated).

After here I was lost. Which propositions are valid with respect to time? How can we only allow propositions which don't get invalidated (EG if we don't know yet which will and will not be), and also, why do we want that?

This setting motivates thinking of points not as objects of study, but as partial observations of objects of study, their shadows that develop according to specialization preorder. [...] The best we can capture objects of study is with neighborhood filters, [...] start with a poset of finite observations (about computations, the elusive objects of study),

You're saying a lot about what the "objects of study" are and aren't, but not very concretely, and I'm not getting the intuition for why this is important. I'm used to the idea that the points aren't really the objects of study in topology; the opens are the more central structure.

But the important question for a proposed modeling language is how well it models what we're after.

This is mostly a keyword dump, pointing to standard theory that offers a way of making sense of logical time.

It seems like you are trying to do something similar to what cartesian frames and finite factored sets are doing, when they reconstruct time-like relationships from other (purportedly more basic) terms. Would you care to compare the reconstructions of time you're gesturing at to those provided by cartesian frames and/or finite factored sets?

Which propositions are valid with respect to time? How can we only allow propositions which don't get invalidated (EG if we don't know yet which will and will not be), and also, why do we want that?

This was just defining/motivating terms (including "validity") for this context, the technical answer is to look at the definition of specialization preorder, when it's being suggestively called "logical time". If an open is a "proposition", and a point being contained in an open is "proposition is true at that point", and a point stronger in specialization order than another point is "in the future of the other point", then in these terms we can say that "if a proposition is true at a point, it's also true at a future point", or that "propositions are valid with respect to time going forward", in the sense that their truth is preserved when moving from a point to a future point.

Logical time is intended to capture decision making, with future decisions advancing the agent's point of view in logical time. So if an agent reasons only in terms of propositions valid with respect to advancement of logical time, then any knowledge it accumulated remains valid as it makes decisions, that's some of the motivation for looking into reasoning in terms of such propositions.

You're saying a lot about what the "objects of study" are and aren't, but not very concretely, and I'm not getting the intuition for why this is important.

This is mostly about how domain theory describes computations, the interesting thing is how the computations are not necessarily in the domains at all, they only leave observations there, and it's the observations that the opens are ostensibly talking about, yet the goal might be to understand the computations, not just the observations (in program semantics, the goal is often to understand just the observations though, and a computation might be defined to only be its observed behavior). So one point I wanted to make is to push against the perspective where points of a space are what the logic of opens is intended to reason about, when the topology is not Frechet (has nontrivial specialization preorder).

But the important question for a proposed modeling language is how well it models what we're after.

Yeah, I've got nothing, just a sense of direction and a lot of theory to study, or else there would've been a post, not just a comment triggered by something on a vaguely similar topic. So this thread is in the same spirit as a comment I left a few months ago to a crackpot post, but that one was even more speculative, somewhat appropriate in a place like that...

For what it's worth, I think this is trying to get at the same insight as logical time but via a different path.

For the curious reader, this is also the same reason we use vector clocks to build distributed systems when we can't synchronize the clocks very well.

And there's something quite interesting about computation as a partial order. It might seem that this only comes up when you have a "distributed" system, but actually you need partial orders to reason about unitary programs when they are non-deterministic (any program with loops and conditionals that can't be unrolled because they depend on inputs not known before runtime are non-deterministic in this sense). For this reason, partial orders are the bread-and-butter of program verification.

This seems related in spirit to the fact that time is only partially ordered in physics as well. You could even use special relativity to make a model for concurrency ambiguity in parallel computing: each processor is a parallel worldline, detecting and sending signals at points in spacetime that are spacelike-separated from when the other processors are doing these things. The database follows some unknown worldline, continuously broadcasts its contents, and updates its contents when it receives instructions to do so. The set of possible ways that the processors and database end up interacting should match the parallel computation model. This makes me think that intuitions about time that were developed to be consistent with special relativity should be fine to also use for computation.

If you mark something like causally inescapable subsets of spacetime (not sure how this should be called), which are something like all unions of future lightcones, as open sets, then specialization preorder on spacetime points will agree with time. This topology on spacetime is non-Frechet (has nontrivial specialization preorder), while the relative topologies it gives on space-like subspaces (loci of states of the world "at a given time" in a loose sense) are Hausdorff, the standard way of giving a topology for such spaces. This seems like the most straightforward setting for treating physical time as logical time.

On a prediction market platform, consider the graph of accounts and money flows between them. Some metrics take the form of adding up how much money flows across a boundary, subtracting backwards flows: How much currency is ever minted and paid out to users; how much currency is ever removed from circulation via fees. But for measuring how well the platform funnels money into smarter hands, it's not clear how to draw a boundary around "the smart accounts". Do you agree that analogous generalizations of Markov blankets should be applicable for both problems?

I don't think that requires a generalization of Markov blankets, I think plain old Markov blankets should just work. The fact that it's not clear how to draw the boundary corresponds to the difficulty of figuring out *which* blanket to use; the answer should be a Markov blanket.

Point-estimated Drake equation parameters cause the Fermi paradox, point-estimated utility functions cause paperclips and I don't expect a point-estimated Markov blanket to go well here either.

This fails if there are closed timelike curves around.

There is of course a very general formalism, whereby inputs and outputs are combined into aputs. Physical laws of causality, and restrictions like running on a reversible computer are just restrictions on the subsets of aputs accepted.

**Edit**: It now seems likely I simply misunderstood parts of the post and that John actually *does* propose to model the situation with a causal model in such a way that there is no dependency between the noise and other things happening in the model. He just doesn't propose *how* to do that.

This is my first encounter with your more technical writing, so I may lack some of the context to make sense of all of this. Nevertheless, here's my unfiltered reaction:

This article does seem to contain useful intuitions, but some of the concrete formalizations that you propose seem wrong from my current understanding.

For example, you want to formalize the situation with a computation DAG, which you claim is the same as a causal model. However, the main issue you use to motivate the formalization — namely, that stuff happening at a different time changes the resulting computations — is, as far as I know, *not* present in the usual formalization of causal models. Namely, the local functions in causal models only depend on the values of the parents and some “noise” that is thought to be independent of anything else going on in the model. In particular, the “noise” would then not depend on the order in which the operations happen.

I think for the same reason, the visualization of the “Cartesian boundary” as being comprised of the nodes visually interfacing between agent and environment makes little sense. For causal models, this would be correct, but if we assume that the order of computations actually **does** matter and that local computations can produce “write”-statements from which arbitrary different nodes can read — in particular, nodes from the environment can read nodes from the agent and vice versa — then we cannot so easily screen off the agent from the environment.

It's possible that reality is even *worse* than this post suggests, from the perspective of someone keen on using models with an intuitive treatment of time. I'm thinking of things like "relaxed-memory concurrency" (or "weak memory models") where there is no sequentially consistent ordering of events. The classic example is where these two programs run in parallel, with X and Y initially both holding 0, [write 1 to X; read Y into R1] || [write 1 to Y; read X into R2], and after both programs finish both R1 and R2 contain 0. What's going on here is that the level of abstraction matters: writing and reading from registers are not atomic operations, but if you thought they were you're gonna get confused if you expect sequential consistency.

- Total ordering: there's only one possible ordering of all operations, and everyone knows it. (or there's just one agent in a cybernetic interaction loop.)
- Sequential consistency: everyone knows the order of their own operations, but not how they are interleaved with others' operations (as in this post)
- Weak memory: everyone knows the order of their own operations, but others' operations may be doing stuff to shared resources that aren't compatible with any interleaving of the operations

See e.g., https://www.cl.cam.ac.uk/~pes20/papers/topics.html#relaxed or this blog for more https://preshing.com/20120930/weak-vs-strong-memory-models/.

(Edited a lot from when originally posted)

(For more info on consistency see the diagram here: https://jepsen.io/consistency )

I think that the prompt to think about partially ordered time naturally leads one to think about consistency levels - but when thinking about agency, I think it makes more sense to just think about DAGs of events, not reads and writes. Low-level reality doesn't really have anything that looks like key-value memory. (Although maybe brains do?) And I think there's no maintaining of invariants in low-level reality, just cause and effect.

Maintaining invariants under eventual (or causal?) consistency might be an interesting way to think about minds. In particular, I think making minds and alignment strategies work under "causal consistency" (which is the strongest consistency level that can be maintained under latency / partitions between replicas), is an important thing to do. It *might* happen naturally though, if an agent is trained in a distributed environment.

So I think "strong eventual consistency" (CRDTs) and causal consistency are probably more interesting consistency levels to think about in this context than the really weak ones.

When programming distributed systems, we always have many computations running in parallel. Our servers handle multiple requests in parallel, perform read and write operations on the database in parallel, etc.

The prototypical headaches of distributed programming involve multiple processes running in parallel, each performing multiple read/write operations on the same database fields. Maybe some database field says “foo”, and process 1 overwrites it with “bar”. Process 2 reads the field - depending on the timing, it may see either “foo” or “bar”. Then process 2 does some computation and writes another field - for instance, maybe it sees “foo” and writes {“most_recent_value”: “foo”} to a cache. Meanwhile, process 1 overwrote “foo” with “bar”, so it also overwrites the cache with {“most_recent_value”: “bar”}. But these two processes are running in parallel, so these operations could happen in any order - including interleaving. For instance, the order could be:

… and now the cached value no longer matches the value in the database; our cache is broken.

One of the main heuristics for thinking about this sort of problem in distributed programming is:

there is no synchronous time. What does that mean?Well, in programming we often picture a “state-update” model: the system has some state, and at each timestep the state is updated. The update rule is a well-defined function of the state; every update happens at a well-defined time. This is how each of the individual processes works in our example: each executes two steps in a well-defined order, and each step changes the state of the system

But with multiple processes in parallel, this state-update model no longer works. In our example, we can diagram our two processes like this:

Each process has its own internal “time”: the database read/write happens first, and the cache overwrite happens second. But

betweenprocesses, there is no guaranteed time-ordering. For instance, the first step of process 1 could happen before all of process 2, in between the steps of process 2, or after all of process 2.We cannot accurately represent this system as executing along one single time-dimension. Proof:

isunambiguouslyafterstep 1 of process 1 in time, so the two steps can’t happen at the same time.In order to accurately represent this sort of thing, it has to be possible for one step to be unambiguously after another, even though both of them are neither before nor after some third step.

The “most general” data structure to represent such a relationship is not a one-dimension “timeline” (i.e. total order), but rather a directed acyclic graph (i.e. partial order). That’s how time works in distributed systems: it’s a partial order, not a total order. A DAG, not a timeline. That DAG goes by many different names - including computation DAG, computation circuit, or causal model.

## Beyond Distributed Programming

The same basic idea carries over to distributed systems more generally - i.e. any system physically spread out in space, with lots of different stuff going on in parallel. In a distributed system, “time” is a partial order, not a total order.

In the context of

embedded agents: we want to model agenty systems which are “made of parts”, i.e. the agent is itself a system physically spread out in space with lots of different stuff going on in parallel. Likewise, the environment is made of parts. Both are distributed systems.This is in contrast to state-update models of agency. In a state-update model, the environment has some state, the agent has some state, and at each timestep their states update. The update rule is a well-defined function of state; every update happens at a well-defined time.

Instead of the state-update picture, I usually picture an agent and its environment as a computation DAG (aka circuit aka causal model), where each node is a self-contained local computation. We carve off some chunk of this DAG to call “the agent”.

The obvious “Cartesian boundary” - i.e. the interface between agent and environment - is just a Markov blanket in the DAG (i.e. a cut which breaks the graph into two parts). That turns out to be not-quite-right, but it’s a good conceptual starting point.

Main takeaway: computational DAGs let us talk about agents without imposing a synchronous notion of "time" or "state updates", so we can play well with distributed systems.