Goal-directedness: relativising complexity

Morgan_Rogers

This is the fourth post in my Effective-Altruism-funded project aiming to deconfuse goal-directedness. Comments are welcomed. All opinions expressed are my own, and do not reflect the attitudes of any member of the body sponsoring me. The funding has come to an end, but I expect to finish off this project as a hobby in the coming months.

My previous post was all about complexity, and ended with an examination of the complexity of functions. In principle, I should now be equipped to explicitly formalise the criteria I came up with for evaluating goal-based explanations. However, several of the structures whose complexity I need to measure take the form of transformations from one structure of a given type to another. This brings us into the domain of relative complexity. I have chosen to present this in a separate post, not just because of the daunting length of my last post, but also because I think this is a subject which this community would benefit from some explicit clarification of. In future posts I shall finally put all of the pieces together and get to the bottom of this goal-directedness business.

Recall that last time that I described complexity as a quantity assigned to a structure that measures how many simple pieces are needed to construct it. The general concept which is under scrutiny this time is a variant of the above, where we consider instead a quantity assigned to a structure that measures how many simple pieces are needed to construct it from a given resource or starting point. A source of confusion lies in the fact that there are multiple ways to relativise complexity, which is why a closer examination is needed.

In this post I'll be discussing a few implementations on relativised notions of complexity.

Relativising Kolmogorov complexity

Conditional complexity

In this subsection I'll be using standard notation and definitions from Ming Li and Paul Vitányi's textbook,^[1] which I will subsequently extend somewhat. They use the convention that the complexity measure is indexed by a partial computable function rather than a given Turing machine which computes it, which makes sense, since the details of the Turing machine are irrelevant in the computation of complexity beyond its input-output behaviour.

The definition of K-complexity for individual strings which we saw last time involved a choice of universal Turing machine $U$ , computing partial function $ϕ : {0, 1}^{*} \to {0, 1}^{*}$ , say, and measured the length of the shortest input $p$ (encoding a program) such that $ϕ (p) = s$ . This gave the plain Kolmogorov complexity, which I'll now denote as $C_{ϕ} (s) := min {ℓ (p) ∣ ϕ (p) = s}$ .

Conditional Kolmogorov complexity allows an auxiliary string $y$ to be passed as a "free" input to the program, so that the corresponding complexity is $C_{ϕ} (s ∣ y) := min {ℓ (p) : ϕ (⟨ y, p ⟩) = s}$ , where $⟨ y, p ⟩$ is the string obtained from $y$ and $p$ by concatenating them and adding a prefix consisting of $ℓ (y)$ copies of $1$ followed by a $0$ (so that the Turing machine encoding $ϕ$ can reliably separate the concatenated inputs). For conciseness, I'll denote $ϕ (⟨ y, p ⟩)$ as $(y) p$ , interpreting $y$ as an input to the program $p$ (with $ϕ$ playing the role of an interpreter or compiler).

This notion of conditional complexity contains some distinct possibilities which I alluded to in the introduction and would like to separate out; hopefully the algebraic example in a later section will help to clarify the distinct cases I'll highlight here.

Consider a situation where $s = y$ . For many such cases, the shortest program $p$ producing $s$ given $y$ is $(y) p$ := print $(y)$ . Similarly, if $s$ contains many copies of $y$ , we can use "print(-)" to insert those copies into the output being produced. On the other hand, if $s$ has nothing to do with $y$ , the most efficient program will be one that simply ignores the extra input and produces $s$ directly. In this way, the input $y$ can be thought of as a resource for obtaining the output $s$ .

Extension complexity

A special case of the above is when $y$ is a prefix of $s$ , but where the remainder of $s$ is not easily compressible^[2] using $y$ . As long as $y$ has non-trivial length, the most efficient program for producing $s$ might be of the form "print $(y)$ and then append $() q$ ", where $q$ is a program producing the suffix of $s$ with no input. In this situation we are using $y$ as a starting point for obtaining the output $s$ . I can refine this into an independent definition by removing the cost of the initial print command: $C_{ϕ}^{'} (s ∣ y) := min {ℓ (q) : y; ϕ (q) = s}$ , where $y; ϕ (q)$ is the concatenation of $y$ and the output of $ϕ$ at $q$ . With this (non-standard) definition we can no longer use $y$ as a resource, and we can only obtain strings which extend $y$ (all other strings having infinite complexity). I hence call this the extension complexity.^[3]

Modification complexity

An alternative option, for a set-up where we work with Turing machines having separate input and output tapes, say, is to initialize the output tape with a given string $y$ . We could also impose further constraints on the model, such as proposing that the machine cannot read from the output tape, only write to it, so that the initial output cannot be used as a resource in the sense discussed above. I'll choose a universal machine set up this way, $U$ , and write $U_{y}$ to indicate that this machine is initialised with $y$ in the output tape. Then we get a notion which I'll call modification complexity, $C_{U} (s \leftarrow y) := min {ℓ (p) : U_{y} (p) = s}$ , which measures the shortest program for generating the string $s$ with initial output $y$ .

Assuming that the machine can also delete symbols in the output, all strings have a finite modification complexity, so this is a genuinely different measure than the extension complexity.

Oracular complexity

I explained last time that we can equip a Turing machine with an oracle, which can perform some decision problems "for free", or rather in a single step of computation. The details are a little technical. Given a subset $A \subseteq {0, 1}^{*}$ , we can define a Turing machine with a reference tape which includes an oracle query state which writes a $1$ to the output tape if the content of the reference tape is a member of $A$ , and writes a $0$ otherwise. Naming the oracle after the subset defining it, we obtain a class of $A$ -computable partial functions, and corresponding $A$ -universal Turing machines, with a necessarily different enumeration of the programs than for ordinary Turing machines. Nonetheless, given the partial function $ϕ$ computed by such a universal machine, we can duplicate the definitions given above with all of the same conclusions.

Comparing complexities arising from different oracles might be interesting, but it's beyond the scope of this post.

What is relativisation really?

We saw last time that even after fixing a collection of objects containing a particular object, there may still be many choices to make before a complexity measure is fully specified over that collection (and hence on that object). On the other hand, we also saw that some choices are natural to some extent, in that they themselves minimize some higher measure of complexity. Whether "natural" or not, there may be a default choice of complexity measure for some collection of objects. Given such a situation, the various types of relative complexity are ways of expressing modifications to the default complexity measure, using all of the features involved (implicitly or explicitly) in the definition of the default complexity measure. We can interpret the above examples in these terms:

The conditional complexity $C (s ∣ y)$ provides $y$ as a basic building block, at a cost which is typically lower than its complexity, the difference being that between generating $y$ and simply reading $y$ .^[4] Since the operations available to a Turing machine are very expressive, the default is to have no building blocks available.
The extension complexity $C^{'} (s ∣ y)$ effectively reduces the collection of objects under scrutiny to those which have $y$ as a prefix (all others having infinite complexity). The default domain for K-complexity is the set of all strings formed from a given character set; this is a natural variant of that complexity measure for that restricted domain.^[5]
The modification complexity $C_{U} (s \leftarrow y)$ takes $y$ , rather than the empty string, as a starting point. Here the default is to start from the empty string.^[6]
Adding an oracle amounts to changing the allowed basic operations and hence changing complexity measures, but in a way which is (or may be) hidden from the final definition of the complexity of a string. The default operations which an oracle-free Turing machine can perform vary, but they always fall into a class of basic "effective" arithmetic operations; a typical oracle adds an operation which is either not effectively computable, or not efficiently computable, to the default operations.

It's worth noting that the cost of any of these contributions to complexity can vary, either because of the explicit implementation of using the resources or operations being provided, or by artificially attaching a cost to such operations. I explored variations in costs last time. The takeaway is that I'm only scratching the surface of the range of notions of relative complexity that are possible. It's time for some more instances in a different setting!

An algebraic example

Relative complexity of group elements

Recall the word length complexity measure on elements in a group which I presented last time. Explicitly, I am given some generators $a_{1}, \dots, a_{n}$ of a group $G$ ; the word length complexity $C_{G} (w)$ of an element $w \in G$ , is the length of a shortest presentation of $w$ as a product of the generators.^[7]

In this and other constructions of elements in algebras, we take the "empty" construction as the starting point by default, such that the identity element always has a complexity of 0, being identified with the empty product of generators. This is a natural choice insofar as it is the only choice which can be made independently of any further details of the structure of the group. However, such a choice may not always be available. To illustrate this, I'll remind you how algebraic objects can act on other mathematical objects, including sets, spaces or other algebraic objects.

For concreteness, I'll specifically talk about (right^[8]) group actions. Suppose I have a group $G$ as above acting on a set $A$ , which formally means that each element $a \in A$ and $g \in G$ determines an element $a \cdot g \in A$ in such a way that $a \cdot (g h) = (a \cdot g) \cdot h$ for all elements $h \in G$ . Given two elements $a, b \in A$ , we can measure the complexity of "getting to $b$ from $a$ ," denoted $C_{A} (b | a)$ , to be the complexity of the (unique) element $h$ such that $b = a \cdot h$ , if it exists, and define the complexity to be infinite otherwise.

The crucial feature of this situation which differs from the complexity measures we saw last time is that we must choose not one but two elements of $A$ , and there is in general no distinguished choice for either element which might allow us to eliminate one of the choices. A sensible reaction to this example is to entertain also a relative version of complexity of elements in an algebra, where we may choose the starting point for the construction to be something other than the identity element. This is justified in this particular case by the fact that the group $G$ acts on its underlying set, which I'll denote by $[G]$ , by multiplication, and hence as a special case of the above we can define the complexity of "getting to $w$ from $g$ ," denoted $C_{G} (w \leftarrow g)$ , to be the complexity of the (unique) element $h$ such that $w = g h$ ; in this case such an element always exists, and is equal to $g^{- 1} w$ .

Generalizing to other algebraic contexts, we may no longer have uniqueness or existence of the element $h$ . This may be the case in a monoid, for example. In that case, we must minimize the complexity across elements satisfying the equation, which makes this relativized complexity measure more interesting.

I have chosen notation consistent with that for modification complexity introduced above, since one way of interpreting $C_{G} (w \leftarrow g)$ is as "the cost of the most efficient way to modify $g$ to obtain $w$ ", and because it amounts to choosing a starting point for the construction other than the default of the identity element. We evidently have $C_{G} (w \leftarrow 0) = C_{G} (w)$ . As such, for any third element $h \in G$ , we have a general inequality $C_{G} (w \leftarrow g) + C_{G} (g \leftarrow h) \geq C_{G} (w \leftarrow h)$ , which is a directed version of the triangle inequality, and taking $h$ to be the identity element we get the special case, $C_{G} (w \leftarrow g) + C_{G} (g) \geq C_{G} (w)$ .

The above is the discrete case, but it works just as well for the continuous case: the topological group $S O (2)$ of rotations in the plane is homeomorphic as a space to the circle $S^{1}$ . A conventional identification of $S O (2)$ with the unit circle in the plane sends the identity element ( $0$ rotation) with the point $(1, 0)$ , but any other choice would be valid. If we measure the complexity of a rotation as the size of the smallest angle of rotation from the identity producing it, the relative complexity $C_{S O (2)} (r_{1} \leftarrow r_{2})$ of two rotations is the smaller angle separating them as points on a (uniformly parametrized) circle.

Conditional complexity

Alternatively, we could consider the "cost of building $w$ using $g$ ", in the sense of adding $g$ to the set of generators. The resulting complexity measure depends on how much each copy of $g$ costs^[9]. If the cost is greater than or equal to $C_{G} (g)$ , the complexity measure collapses to the original complexity measure in terms of the generators $a_{1}, \dots, a_{n}$ . At the other extreme, if the cost is zero, the complexity is a measure of how close $w$ is to a power of $g$ ; we shall return to this case below. Between these is the case where the cost is one, where we recover the ordinary complexity of $w$ after adding $g$ to the set of generators. As ad hoc notation, if the cost of each copy of $g$ is $c$ , I'll write $E_{g : c} (w)$ for the corresponding conditional complexity of $w$ . For any value of $c$ , we have $E_{g : c} (w) \leq C_{G} (w)$ .

If the relative complexity $C_{G} (w \leftarrow g)$ treats $g$ as a new starting point for constructions, the conditional complexity treats $g$ as an extra building block which we can use in the construction, without shifting the starting point; it's altering the default choice of generators. Unlike the default of starting from the identity, there may be no default choice of generators for a given group, so remember that this notation assumes a choice has been established previously. Again, relative complexity measures are really just a way of expressing complexity measures by comparison with some pre-existing (default) choice.

Relativising to subsets

Rather than individual elements, we could consider subsets which we are allowed to freely choose from. Let $S \subseteq G$ be a subset; then we can define the relative complexity $C_{G} (w | S)$ to be $min {C_{G} (h) ∣ \exists g \in S, w = g h}$ . As particular cases relating to the previous definitions, we have that $C_{G} (w | {g}) = C_{G} (w | g)$ , and if $⟨ g ⟩$ is the subgroup of $G$ generated by $g$ , then $C_{G} (w | ⟨ g ⟩) = E_{g : 0} (w)$ .

We can also define the extension complexity $E_{S : c} (w)$ in the evident way. Indeed, if $S$ is actually a subgroup of $G$ , or if we replace $S$ with the subgroup it generates, the $0$ -cost version of the extension complexity coincides with the relative complexity in the quotient right $G$ -action $S ∖ G$ , or as an equation, $E_{S : 0} (w) = C_{S ∖ G} (S w | S)$ .

I'm hoping that if you have some experience in algebra, you might be able to come up with some further examples of relativised complexity measures for yourself, or to identify how this example can be extended by analogy to another relativised version of Kolmogorov complexity.

Conclusions

If the equations and inequalities I've sprinkled through this post aren't enlightening, the general takeaway from them is that relative notions of complexity allow us to interpolate between complexity measures, or see one type of complexity measure as an extreme case of another. They also allow us to acknowledge and adjust parameters which a default point of view on a setting would take for granted.

Mine is not a conventional take on complexity. Complexity seems too often to be assumed as an intrinsic property of an object of study, with various desirata attached regarding how this quantity should behave. I hope that I have managed to illustrate through sheer breadth of possibility that this is a reductive point of view. There is a vast web of notions of complexity and a deep foundation of assumptions underlying any given complexity measure. Understanding the greater structure should promote understanding of any particular choice of complexity one selects.

^{^}
Do not mistake this for an endorsement of said textbook. The second chapter opens with the sentence, "The most natural approach to defining the quantity of information is clearly to define it in relation to the individual object [...] rather than in relation to a set of objects from which the individual object may be selected," a sentiment which I am strongly opposed to (as my exposition in the present and previous post hopefully make apparent). Having expressed this opinion, however, the authors go on in a later paragraph to set up their notation with the sentence, "Denote the set of objects by $S$ , and assume some standard enumeration of objects $x$ be natural numbers $n (x)$ ," in blatant opposition to their "natural approach"...
^{^}
I didn't introduce the notion of compressibility last time, but for this conditional case it amounts to saying that having $y$ at our disposal does not shorten the length of the program required to output the suffix of $s$ .
^{^}
I would have called this "prefix complexity", but that is already the name of a variant of Kolmogorov complexity explained in Chapter 3 of Li and Vitányi's textbook, involving prefix-free codes.
^{^}
It's hard to make the cost reduction precise, although I could probably provide some bounds on it. The recursive nature of Turing computation means that any string produced in the running of the program may subsequently be used by the program an effectively unlimited number of times with a corresponding similarly reduced complexity cost; providing $y$ as a basic building block only negates the initial cost of generating the string.
^{^}
Digital files contain a header specifying some parameters of the file before the actual body of the file (the data you see when you open it) begins. If you write a program that outputs a file, that header will usually take a default value, so that the length of your programme is an upper bound on some instance of extension complexity.
^{^}
Continuing the above example, you could write a program which produces a file starting from a template file, at which point the length of your program is an upper bound on the modification complexity from the template to the final file produced, now ignoring the file header.
^{^}
I'm aware that this is not great notation for the complexity, since it clashes with the notation conventionally used for centralizers in group theory. Please invent original notation if you use this concept in your own work!
^{^}
If we instead used left actions, we would get a dual definition of relative complexity later on.
^{^}
For the purposes of this post, I will assume that the costs are constant, although I mentioned in the last post that this need not be the case. Indeed, if we allow any function of the number of instances of $g$ in the construction, we can also express the relative complexity $C_{G} (w \leftarrow g)$ as an instance of the resulting more general type of complexity measure.
^{^}
This example is hiding some subtleties of formalization. I'm implicitly computing the relative complexity of the graphs as the minimum number of some basic operations (which include edge deletion and duplication) required to produce an output graph isomorphic to the target graph. The transformation I describe is hence not a graph homomorphism, and if I were to enforce the rule that transformations should be graph homomorphisms, then the resulting composite would in fact not by the identity homomorphism and hence would carry complexity greater than 0.