# Ω 12

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

TLDR: I sketch a language for describing (either Bayesian or infra-Bayesian) hypotheses about computations (i.e. computable[1] logical/analytic facts). This is achieved via a novel (AFAIK) branch of automata theory, which generalizes both ordinary and tree automata by recasting them in category-theoretic language. The resulting theory can be equipped with a intriguing self-referential feature: automata that "build" and run new automata in "runtime".

Epistemic status: this is an "extended shortform" i.e. the explanations are terse, the math is a sketch (in particular the proofs are not spelled out) and there might be errors (but probably not fatal errors). I made it a top-level post because it seemed worth reporting and was too long for a shortform.

## Motivation

Finite-state automata are a convenient tool for constructing hypotheses classes that admit efficient learning. A classical example is MDPs in reinforcement learning. Ordinary automata accept strings in input. This is convenient when the input is a time sequence. However, in Turing Reinforcement Learning and infra-Bayesian physicalism we want to equip our agent with beliefs about computations, which don't form a time sequence in the ordinary sense.

Computations are often represented as trees: e.g. syntax trees of lambda calculus, or any other functional programming language, or expressions defining recursive functions. Hence, the obvious guess is using tree automata. Specifically, we will focus on bottom-up tree automata which are more powerful and more convenient for other reasons as well.

One limitation of this is that hypotheses described using (infra-)probabilistic tree automata are missing correlations between computations. Even if two programs contain a literally identical subroutine, or a single program uses multiple copies of an identical subroutine, the uncertainty is about the different copies is uncorrelated.

Another limitation is that sophisticated automata might require exponentially many states, meaning that a straightforward (tabular) description of the transition rules is exponentially large. Instead, we would like a convenient compositional language for constructing such automata (analogously[2] to the compositional language I proposed for MDPs).

Therefore, a generalization is called for.

Bottom-up deterministic tree automata are algebras over the operad of trees[3] (modulo accepting states, which are not encoded in the algebra structure). More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton, where the arity of each symbol in the alphabet becomes the operadic arity of the corresponding generator. The transition rules of the tree automaton are the operations in the algebra.

So, we can now define a deterministic automaton over a (colored) operad to be an -algebra. The automaton is "finite" when the colors are a finite set, and the algebra elements of each color are a finite set. An example application is: take the operad of lambda terms, quotient by -reduction and -conversion, and add the renaming of free variables as additional operations into the operad. This leads to hypotheses-about-lambda-expressions which have a better inherent understanding of the semantics. On the other hand, the finiteness condition seems too restrictive now. But, maybe we can get away with, finite number of orbits under the action of the variable-renaming group. More generally, it is not clear which operads lead to a computationally-convenient notion of automaton (similarly how many problems are computationally/statistically tractable for ordinary automata): this seems like a good problem to investigate.

The automata-theoretic notion of "transducer" also has a natural operadic analogue. However, in full generality, this analogue turns out to be much more powerful than classical transducers, even if the underlying operad is just the operad of the free monoid over some alphabet (i.e. in the case where operadic automata are just ordinary automata)[4]: ordinary transducers can only add symbols to the end of the output as they process new input symbols, whereas operadic transducers can e.g. add symbols to the beginning, or write into two "working tapes" s.t. the output is their concatenation.

Given an operad , we define the associated "state operad"[5] as follows. Define "-signature" to be a pair of (i) a list of -colors (the input types) and (ii) an -color (the output type).

• A color of is a set (the "state space"), and for each , a finite tuple of -signatures (the "variables" defined at the given state).
• An operation of is a mapping (where are the the input colors and is the output color) and, for each and , an operation of signature in the operad freely generated by and generators corresponding to the components of for all .

Given operads and respective colors, a determinsitic operadic transducer from to can now be defined as an operad morphism from to s.t. for any , the tuple starts with a -ary signature whose output color is . Given such , we can transform any -ary operation with output color to a -ary operation with output color (exercise for the reader).

It is also possible to make transducers process morphisms of general signature. We leave out the details, since the notation in the operad setting is unwieldy. We do explain it for cartesian transducers below.

## Deterministic Cartesian Automata/Transducers

It is often convenient to represent programs as directed acyclic graphs (DAGs) rather than trees. Indeed, we want to be able to define a term once and use it multiple types later. Hence, it is natural to look for a flavor of automaton that can process DAGs. This is achieved by replacing operads with cartesian monoidal categories and algebras by strong monoidal functors from the category to . DAGs are thought of as morphisms in a free cartesian category, just like trees are operations in a free operad.

Given a cartesian category , we can define the "state cartesian category" as follows. Here, a ""-signature" is just a pair of objects (source and target).

• An object of is a set (the "state space"), and for each , a finite tuple of -signatures (the "variables" defined at the given state).
• A morphism from to in is a mapping and, for each and , a morphism of signature in the cartesian category freely generated by and generators corresponding to the components of .

Given objects of , we have and is the concatenation of and .

A determinsitic cartesian transducer is now defined as consisting of:

• Input cartesian category and output cartesian category
• Primary input signature
• Auxiliary input signatures
• Input state set
• Output signatures
• Output state set
• A cartesian functor from to s.t. (i) and for any , is a prefix of (ii) and for any , is a prefix of

A transducer can accept an element of and -morphisms of signatures and output an element of and and -morphisms of signatures .

## Compositional language

We can now start defining our compositional language for constructing automata. This language describes[6] automata/transducers as string diagrams. There are two types of edges: edges labeled with sets, and edges labeled with a cartesian category and some -signature. Vertices are labeled by transducers. Each vertex has incoming edges with labels corresponding to the transducer's input state space and input signatures, and outgoing edges with labels corresponding to the transducer's out state space and output signatures. The edges should be thought of as "wires" that conduct set elements or morphisms of prescribed signature respectively.

There is an interesting phenomenon here: string diagrams are in themselves morphisms of some cartesian category (with signature corresponding to labels of the diagram's external edges)! This allows us to get self-referential, and add a special type of vertex in our diagram, which has one input which is a diagram of the same type (with prescribed external edge labels) and some number of other inputs, and whose semantics is applying the diagram to the inputs (where the outputs and other inputs have labels matching the diagram type).

I think that without further constraints, the resulting language might be Turing complete: at the very least, it is possible to create an infinite loop by coming up with an input and a diagram s.t. contains a transducer which transforms into , and runs its input through and runs another copy of the input through the diagram outputs. [EDIT: Actually, this requires to produce a diagram which contains . Unless we allow this explicitly, it shouldn't be possible with "plain" cartesian automata. However, I think it is possible with closed cartesian automata (see below and comment).]

To avoid infinite loops, we can add a counter for the "level of meta" (i.e. the meta-vertex in diagrams of level can only accept diagrams of level lower than ). It is also possible to make this counter an ordinal, by representing ordinals in some notation that itself naturally takes the form of trees/DAGs. Although to get a tractable language, we need to impose some strong constraints: even without the meta-vertex, we can get super-exponential blowup because each transducer can increase the size of the content of a variable by a constant factor for every input symbol. Possibly we should require our category to be equipped with some way to measure the "size" of morphisms (i.e. a functor to , or maybe a lax functor to regarded as a 2-category using its order), and most transducers to e.g. only increase size by .

So far, everything was determinsitic. But, we want our hypotheses to have probabilistic and maybe also "infra-probabilistic" uncertainty. The obvious way to define probabilistic automata is to replace cartesian categories with Markov categories and replace with e.g. (the category of measurable spaces with Markov kernels as morphisms). However, this is poorly suited for our purpose, because (i) we a joint distribution over the outputs of all programs, and (ii) if an identical subroutine is defined multiple times, we want our uncertainty about those copies to be strongly correlated. Intuitively, want we want is allow all automata read from an additional "tape" filled with random bits. This can be naturally expressed using monads.

Fix a monad [7] that is equipped with "lax cartesian structure"[5:1]. By this we mean that it has lax monoidal structure w.r.t. the monoidal structure of given by products, and this structure satisfies that for any sets and mappings and , we have

Here, stands for the diagonal mapping.

Also, we require that is a strength of , where is the monad unit. However, we require neither that arises from this strength nor that it makes commutative (typically neither holds!) We call such a monad "memo monad".

The prototypical example is when is the set of binary trees whose leaves are labeled by elements of [8]. We think of such a tree as a way to select an element of by reading a stream of bits. Monad multiplication works by attaching the label-trees to the "parent" tree at the corresponding leaves. The monoidal structure works by taking the union of two trees, where each tree is regarded as the subset of corresponding to the bitstrings that describe the paths from the root to each vertex (imagine placing one tree such that it overlaps the other), and propagating the labels to the leaves in the obvious way.

Another example is binary trees whose non-leaf nodes are labeled by symbols from some fixed alphabet . We think of each element of as corresponding to its own stream can be be queried indepedently. We won't spell out the monoidal structure, but imagine running two subroutines one after the other while memoizing all the queries the first subroutine made from the benefit of the second. As opposed to the first example, this monoidal structure is not symmetric.

We also need to be equipped with a strong monad morphism to the monad defined as follows. For any set ,

Here, is the set and is the distribution monad.

We think of these as distributions over a list of outcomes of size , with each outcome labeled by an element of . Monad multiplication works by combining concatenation of strings with multiplication of probabilities in the natural way. Importantly, there are canonical monad morphisms from to both the free monoid monad and the distribution monad .

The morphism from to means that automatically acquires monad morphisms to (we will denote it ) and (we will denote it ). In the case of binary trees, this morphism is defined by listing the leaves of the tree in horizontal order and assigning probabilities by postulating a fair coin at each non-leaf vertex.

To define automata, we require that our cartesian categories are equipped with a memo monad. Given a cartesian category with memo monad , a -automaton is a strong monoidal functor together with a natural isomorhism between and which is (i) strong monoidal (ii) makes a morphism in the 2-category of monads on arbitrary categories (we call this a "memo functor"). Given objects in and a morphism, is a Markov kernel from (the initial state of the automaton) to (the final state of the automaton).

Notice that, given and , has the marginals , but the corresponding random variables are not independent. (Because of how interacts with the monoidal structure of .) This means we get a non-trivial joint distribution over all morphisms, which is why we introduced monads in the first place.

This formalism has an infra-Bayesian analogue as well: it requires replacing by a the monad (and we know it is a monad) and replacing by the corresponding infra-Bayesian monad. We won't spell out the details.

Monadic cartesian transducers can be defined, but we won't spell out the definition. The definition is spelled out for the case of closed cartesian transducers (see below), where the notation is a little simpler.

## Processing DAGs with MCAs

As we discussed, programs are DAGs therefore DAGs are the starting point of any chain of transducers we might want to build. DAGs can be naturall regarded as morphisms in a free cartesian category. But how to embed them into a free cartesian category with memo monad? We do it as follows:

• Assume any "generator" vertex has one outgoing edge (and any number of incoming edge). Such a vertex corresponds to a generator morphism with signature , where the and are generator objects and is the memo monad.
• A generator morphism can be transformed to a morphism as follows. To get , we compose (i) the monoidal structure for (to get from to (ii) (to get from to ) (iii) the monad multiplication of (to get from to ).
• The morphisms of form can be vertically and horizontally composed with each other in the usual way to evaluate the DAG. In particular, every edge corresponds to for some generator object .
• Any external incoming edge labeled by a generator objected is interpreted as the monad unit on in order to get from to .

It would be nice to formulate this process as evaluating a string diagram in the corresponding Kleisli category, but for a memo monad the Kleisli category is not even monoidal. Maybe there is some way to make it work, but atm I don't know how.

It is convenient to require that our cartesian categories are closed, since it allows simplifying the definition of the "state category" 2-functor and improve its properties. We will also require the category to have arbitrary (not just finite) products. Indeed, let be such a category. Then we define the as follows:

• An object of is a set (the "state space"), and for each , an object of (the "variables" defined at the given state).
• A morphism in is a mapping and, for each , a morphism .

Given objects of , we have

Moreover,

So far, we could avoid the infinite products by requiring that is finite. But, when is equipped with a memo monad , we also want to have a memo monad . This will require infinite sets. We define as follows:

is indeed a monad, but to define the memo (monoidal) structure on it, we need additional structure on . Let and be two sets and consider the following two mappings :

In general, this diagram doesn't commute, i.e. . But, we require that it "lax commutes" in the following sense.

For any set , can be regarded as a category, in which a morphism from to is a mapping s.t. the following conditions hold:

• For any , .

This allows us to regard and as functors, and we postulate a natural transformation . We also postulate another natural transformation related in the same way to (it is separate since is not symmetric). Since and are actually natural transformations in terms of their dependence on , we require that and are 2-natural transformations. Possibly there are more conditions we need to satisfy but we won't try to specify them precisely. In the case of trees/queries, are defined using the observation that, knowing the result of making more queries is sufficient to predict the result of making less queries.

We can now define the monoidal structure on given the monoidal structure on . Consider some objects in . We define

As before, a (monadic) closed cartesian automaton is a (memo) cartesian closed functor from to . A (monadic) closed cartesian transducer from to is a (memo) cartesian closed functor from to . Given an object of , a transducer can be regarded as taking morphisms as inputs producing and as outputs, where:

Notice that these automata are in general not "finite" in any sense. To have a notion of "finite" automaton/transducer, we need to have a fully faithfull cartesian closed subcategory of representing "finite" objects, which (in some sense?) generates all of if arbitrary products are allowed. We can then define to consist of s.t. is a finite set and for every , is in . A finite automaton is a functor s.t. is a finite set whenever is a finite object, and a finite transducer is a functor s.t. is a finite object whenever is. We should probably also require that, if an infinite product is a finite object, then all of are finite objects and all but a finite number of them are isomorphic to the terminal object . Notice that the later condition is consistent with the way we defined .

Curiously, the state category 2-functor seems to be a "commutative"[9] 2-monad. Indeed, the 2-monad multiplication corresponds to the functor defined as follows. Given an object of :

The 2-monoidal structure corresponds to the functor defined as follows. Given and object of and an object of :

This means that transducers are 1-morphisms in the corresponding Kleisli 2-category. In particular, composition of two transducers is a transducer. Moreover, an automaton is just a transducer from to the terminal closed cartesian category. And, transducer string diagrams are essentially just string diagrams in the Kleisli 2-category.

1. We can use the same methods to talk about arbitrary describable facts, but I'm content with computable. ↩︎

2. The analogy is very rough, it is possible that there is some stronger compositional language that contains both as segments. ↩︎

3. This is obvious, but I haven't seen it in the literature. ↩︎

4. I don't know whether even this special case is already known under some other name in automata theory. ↩︎

5. I don't know whether there is a standard name for this in category theory. ↩︎ ↩︎

6. In full generality, I'm not sure the composition of transducers yields transducers (however, it certainly does for closed cartesian automata, see below), so this language might define a broader class of "automata" rather than just describing them. ↩︎

7. Instead of , we can use a different cartesian category for the state spaces of our automata, e.g. compact Polish spaces. But, we will continue the explanation using , for the sake of simplicity. ↩︎

8. Equivalently, is the free magma on . In fact, algebras over are precisely magmas. ↩︎

9. Not sure what is the right term for 2-monads. ↩︎

# Ω 12

New Comment

Can you provide some simple or not-so-simple example automata in that language?

Good idea!

## Example 1

Fix some alphabet . Here's how you make an automaton that checks that the input sequence (an element of ) is a subsequence of some infinite periodic sequence with period . For every in , let be an automaton that checks whether the symbols in the input sequences at places s.t. are all equal (its number of states is ). We can modify it to make a transducer that produces its unmodified input sequence if the test passes and if the test fails. It also produces when the input is . We then chain and get the desired automaton. Alternatively, we can connect the in parallel and then add an automaton with boolean inputs that acts as an AND gate. is a valid multi-input automaton in our language because AND is associative and commutative (so we indeed get a functor on the product category).

Notice that the description size of this automaton in our language is polynomial in . On the other hand, a tabular description would be exponential in (the full automaton has exponentially many states). Moreover, I think that any regular expression for this language is also exponentially large.

## Example 2

We only talked about describing deterministic (or probabilistic, or monadic) automata. What about nondeterministic? Here is how you can implement a nondeterministic automaton in the same language, without incurring the exponential penalty of determinization, assuming non-free categories are allowed.

Let be some category that contains an object and a morphism s.t. and . For example it can be the closed cartesian category freely generated by this datum (which I think is well-defined). Then, we can simulate a non-deterministic automaton on category by a deterministic transducer from to :

• The state set is always the one element set (or, it can be two elements: "accept" and "reject").
• For every state of , we have a variable of signature . This variable is intended to hold when the state is achievable with the current input and otherwise.
• We use the fact that composition of endomorophisms of behaves as an "OR" operator to implement the transition rules of .

However, this trick doesn't give us nondeterministic transducers. A completely different way to get nondeterminism, which works for transducers as well, is using infra-Bayesian Knightian uncertainty to represent it. We can do via the memo monad approach, but then we get the constraint that nondeterministic transitions happen identically in e.g. identical subgraphs. This doesn't matter for ordinary automata (that process words) but it matters for tree/graph automata. If we don't want this constraint, we can probably do it in a framework that uses "infra-Markov" categories (which I don't know how to define atm, but it seems likely to be possible).

Together with Example 1, this implies that (this version of) our language is strictly more powerful than regular expressions.

## Example 3

Suppose we want to take two input sequences (elements of ) and check them for equality. This is actually impossible without the meta-vertex, because of the commutativity properties of category products. With the meta-vertex (the result is not an automaton anymore, we can call it a "string machine"), here is how we can do it. Denote the input sequences and . We apply a transducer to that transforms it into a string diagram. Each symbol in is replaced by a transducer that checks whether the first symbol of its input is and outputs the same sequence with the first symbol removed. These transducers are chained together. In the end of the chain we add a final transducer that checks whether its input is empty. Finally, we use the meta-vertex to feed into this chain of transducers.

Notice that this requires only one level of meta: the string diagram we generate doesn't contain any meta-vertices.

## Example 4

More generally, we can simulate any multi-tape automaton using a succinct string machine with one level of meta: First, there is a transducer that takes the tapes as separate inputs and simulates a single step of the multi-tape automaton. Here, moving the tape head forward corresponds to outputting a sequence with the first symbol omitted. Second, there is a transducer that takes the inputs and produces a string diagram that consists of copies of chained together (this transducer just adds another to the chain per every input symbol). Finally, the desired string machine runs and then uses a meta-vertex to apply the diagram produces to the input.

## Example 5

A string machine with unbounded meta can simulate a 1D cellular automaton, and hence any Turing machine: First, there is a transducer which simulates a single step of the automaton (this is a classical transducer, not even the stronger version we allow). Second, we already know there is an automaton that tests for equality. Third, fix some automaton whose job is to read the desired output off the final state of the cellular automaton. We modify to make a transducer , which (i) when the inputs are equal, produces a string diagram that consists only of (ii) when the inputs are different, produces this string machine. This string macine applies to the input , then applies to and , then uses the meta-vertex to apply to ...

Except that I cheated here because the description of references a string machine that contains . However, with closed machines such quining can be implemented: instead of producing a string diagram, the transducer produces an morphism that converts transducers to string diagrams, and then we feed the same transducer composed in the same way with itself into this morphism.