# Ω 16

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

# The Problem

If someone asked you and me to each write a python program to solves some novel problem then it wouldn't be surprising if our solutions looked quite different. If we then tried explaining to the other person the way our own program worked, one thing that might happen is that we would realize that in spite of surface differences in the way that we wrote it the solutions were actually quite similar. We might have called variables different names, done some steps in a different order and so on, but both programs worked for the same reasons.

Another thing that might happen is that even after really understanding how the other persons program worked the solutions still seemed different. Maybe each program would seem to rely on different mathematical facts. Maybe you would realize that the way I saw the problem was fundamentally different from the way you saw it, you'd never thought about it like that before, and so the approaches I considered were in a totally different regime from the ones you considered. Yours might run asymptotically faster than mine, in a way such that even if I tried to cut out redundancies and optimize my algorithm I could never make it run as fast as yours without fundamentally changing how it worked. Mine might generalize better in some direction, whereas yours might not generalize as well or maybe generalize in a different direction.

This is the kind of intuition that a formal theory of algorithmic similarity should seek to capture. Given two Turing machines that compute the same function it should be able to tell us to what degree they use the same or different algorithms. Exactly how course or fine-grained the classification should be is unclear, and maybe different versions could work with different levels of detail, but it should capture some of our intuitive notions of algorithmic similarity, e.g. that changing the name of a variable generally doesn't change the algorithm, whereas there are fundamental differences between the sieve of Erathostenes and the AKS primality test.

A different problem that might be the same problem or might be a different but related problem is the question of whether one program uses/contains another program.

# A Variant

Consider a simple program that outputs 1 if x>y and else outputs 0. Call this program F1. And lets suppose that I know how to compute x-y when x>y or x=y. Then I might write a program that computes the absolute difference of two natural numbers x,y in the following way:

def P1(x,y):

• if F1(x,y)==1
• output x-y
• if F1(x,y)==0
• output y-x

This specific program clearly makes use of F1. But lets say we had a different program F2 that computed the same function as F1 but in a fundamentally different way. Then we could define:

def P2(x,y):

• if F2(x,y)==1
• output x-y
• if F2(x,y)==0
• output y-x

Now, in some ways P1 and P2 look the same, since they both follow the following general outline:

• if x>y
• output x-y
• else
• output y-x

Yet P1 contains F1 whereas P2 does not. Intuitively, if it turned out that the algorithm in F1 didn't actually work as we thought and so didn't compute the function we thought it did, then P2 would still work the same but P1 would be broken.

The general question we would like the theory to answer is thus: given two algorithms, determine whether (and maybe to what degree) one contains the other.

Notice that even if an algorithm contains F1 this might be much harder to spot than in the above example of P1, especially if we allow it to use trivial variations of F1 and still have it count. When humans write programs they explicitly write them in a modular form so that it is easy to see how they work and what components they contain, but in cases such as algorithms created by gradient descent, figuring out how they work and which components they contain can be much harder.

Notice also that this is a question at the level of algorithms not functions. You can't just say that absolute difference contains F1 whereas addition does not, even if the most natural way to make an absolute difference algorithm contains F1 and most addition algorithms really shouldn't have need of F1.

Finally, though P1 and P2 used different algorithms to compute the greater-than-function they both still used some algorithm to compute the greater-than-function, and so we might be interested in the more general question of whether a given program contains the greater-than-function at all, irrespective of whether it uses F1, F2 or something else entirely.

In a similar vein, and more related to the first question of algorithmic similarity, even though at some level of resolution P1 and P2 are different since one uses F1 and the other F2, we would still like our theory to recognize that the overall strategy of P1 and P2 were the same, and to differentiate that strategy with other programs that might use completely different strategies, even ones that also compute the absolute difference .

So this is all well and good and technical. But of course the real problem is about philosophy and metaphysics.

# The Real Problem

Mathematics is uncannily useful at describing the world. In fact a lot of the great successes of science has been to find good ways of describing parts of the world using math, whether it be general relativity describing gravity, information theory giving the woolly concept of information a precise meaning or encoding complicated real world planning problems into number crunching problems that we can just feed into a computer to get the optimal answer out.

And even if you wont go as far as proclaiming that the world simply is math, you might still believe that there is nothing in the world that can't be explained by math, that there exists such a complete mathematical description of the universe such that for every property of the real world there is a corresponding mathematical property concerning this description of the world.

What does this have to do with algorithmic similarity? Assume the strong Church-Turing thesis i.e. that the whole universe is computable. Then a natural question to ask is, which Turing machines simulate the universe and which doesn't. This question seems like it should have an answer, some Turing machines just run to the left as fast as possible, those don't seem to simulate the universe, whereas others might encode the state of the universe on their tape and then gradually change the tape content according to the laws of physics (it would probably have been more natural to think in terms of cellular automaton). And yet, this raises the question of algorithmic similarity. Two Turing machines might use very different encodings of the world state, how do I tell that they are really computing the same universe? And on the other hand, one Turing machine might just be doing some arbitrary thing that looks complicated but is actually meaningless, how do i rule out the claim that it is just simulating the universe in an non-obvious way?

Consider now the question of whether there is diamond in the universe. For a given Turing machine simulating a universe similar to ours this question should have an answer (an answer for every time step or something). Yet in general, this seems very hard to answer. Though diamond is a natural category in the world, it might not be a very natural category in a specific encoding of the world, so even though a Turing machine simulating the universe must contain a piece that is simulating diamond, having a theory that identifies that subcompontent and differentiates it from simulations without that component look like quite the challenge.

But even now that you know the real question is about cool philosophy stuff, you might still want to ask; why should I care?

# Why You Should Care

Reason 1: Annoying waterfall apologists

The one comes to you, shows you a random waterfall and says: this waterfall is implementing a conscious human being.

What do you say to this person? They are clearly crazy, but how do you convince them of their error or at least make their position look undefendable to bystanders watching the argument?

I can imagine someone understanding minds so well on a technical level that they could build a computer that completely emulates a human mind. And maybe someone skilled enough in the art of hydrocomputation could even do the same on a waterfall. But there is just no way that a arbitrary naturally occurring waterfall is running a human mind, and yet here we are, with waterfall apologists taking over key positions in government and academia, all because we don't have the theory of algorithmic similarity that could have prevented this tragedy.

Reason 2: What is a world model/What does it mean to have true beliefs

True belief is when your map matches the territory. These are words we all hear every morning when we recite our morning prayer to rationality. But what does it actually mean? Intuitively, there is some thing in the world, and there is some thing in your brain, and then there is some relation between the thing in the world and the thing in your brain such that the thing in your brain is "like" the thing in the world and you can make predictions about how the thing in the world would react merely by doing stuff to the thing in your brain. To explain what that relation is and what "like" means we need a theory of algorithmic similarity. (I know the point of The Simple Truth was not to ask too many questions, but honestly, how are the pebbles "like" the sheep??)

Reason 3: We need it for this very unrealistic AI design

Consider this very unrealistic AI design. It's got at great model of the world inside its head. It considers sequences of actions, for each sequence computes the different states of the world that could follow from that sequence and how likely they each are. Then it runs a pre-programmed function on the hypothetical world states to compute the amount of value, e.g. diamonds, in each state of the world, weights each world by its likelihood and chooses the action sequence that would in expectation lead to the most value.

Forgetting the complexity of human values for the moment, how does the function count the amount of value in the representation of a world state in the AI's head? Algorithmic similarity. But wait, you might say, what if we fix the way the AI represents world states such that it's easy to look inside? This might fix part of the problem (see Ontological Identification), but we might not want to fix the AI's representation like that, maybe we want it to learn the best world model it can without constraints, but in that case we probably need a better understanding of how to look inside of computational objects.

Even disregarding values, when the AI has found a world model that yields good predictions we want it to be able to look inside that model and do stuff like count the amount of diamonds in it, or find out other physical facts about it, even if it found that model just by looking for computational objects that predicted its sensory input.

Reason 4: It could give us a better agent/optimizer criterion

In Risks From Learned Optimization the authors write: "whether a system is an optimizer is a property of its internal structure—what algorithm it is physically implementing—and not a property of its input-output behavior ". In this case the usefulness of a theory of algorithmic similarity is obvious. Algorithmic similarity could give us a framework with which to talk about classifications of exactly this nature.

Reason 5: Because FDT needs it

FDT reasons by imagining that the mathematical function making its decisions have different outcomes. But the question then arises of how FDT determine what things in the universe are implementing her decision algorithm. This seems like an ideal situation for a theory of algorithmic similarity to bring some clarity. We need a theory of how similar two algorithms should be for FDT to change both of them, when one algorithm contains FDT's decision algorithm so that changing her own means changing part of the other and what it means for part of the universe to implement a decision algorithm at all.

Reason 6: Deep insights into the nature of reality/Giving computational functionalism the comeback it deserves!

Self-explanatory. (see Computational Functionalism, these guys need help)

Now that you understand the problem and care deeply about solving it the only reasonable thing to do would be to end this blog post and get to work, right?...

# Why The Question Doesn't Make Sense And We Shouldn't Expect There To Be A Satisfying Answer

I've been telling you tales about all the nice properties this theory will have and all the confusing situations that it will explain in a perfectly satisfying manner but all this time I was really selling you a mirage, we don't have the thing yet, and so we run the risk of the question itself just not making sense to ask. I think this is a reasonable thing to be afraid of. Intuitions can be misleading and when the thing you are looking for is this great the reason might be because it is just a magical dream.

I don't believe so, but other smart people do. And even if you think the criticisms in the end will turn out to be invalid, they're still important to be aware of now so we know the extent of the challenges that a theory like this would somehow have to overcome.

This will not be en exhaustive list of the problems you could have with the views presented so far. In fact, there will only be one entry to the list and I will definitely not present the strongest case for it. Treat it as an appetizer. I wont even respond to the criticism, hopefully I will get back to that some time, maybe someone else will as well. I don't believe I have any knock-down arguments, that might require the full theory, but I still think there is something to be said in response.

Criticism 1: It's Trivial

Lets say you have some system A, might be an algorithm, some casual graph, some mathematical description of the human mind or maybe the whole universe. I will give an intuitive argument that basically any other physical system can be seen as implementing this system.

First we need to identify the states that A can be in. Lets suppose A only has finitely many states, this should normally be enough since real things are generally thought to be finite, but much of the same argument would work with infinitely many states. Since this is a deterministic system, for each state there will be exactly one state that follows. You can imagine drawing this out as a huge diagram. If you also did this with something like a bucket of water then because of the astronomical amount of molecules in the bucket there would be an almost inconceivable amount of states it could be in, with all sorts of complicated connections between them.

Now the heuristic argument is that there would almost certainly be a way to identify sets of states in the bucket system such that the resulting graph were identical to the graph of system A. This might be a very unnatural coarse-graining, you might end up associating very different physical states in the bucket system and saying that they are the same, but on the other hand, when you say that a physical calculator is implementing addition you are also bundling together a bunch of distinct physical states together to make that argument. And in any case, there should definitely be something disturbing about the statement that if you just look at a bucket of water the right way it can be seen to implement pretty much anything, including a human brain.

You could try to make rules for which ways of associating states are allowed and which aren't, but this seems somewhat unnatural, and in any case it seems hard to make up a set of rules such that you could never game the definition with something like this.

There are better versions of this arguments, versions where you actually construct a trivial physical system that implements A in this way without resorting to "almost certainly", but I wont go into them here. For now I just want to make the remark that triviality results like these will have to be dealt with in some way for any good theory of algorithmic similarity to work, especially if we want it to have all the nice philosophical implications.

# Conclusion

If you only remember one thing from this post, let it be this: algorithmic similarity is a cool problem and by thinking about it you can be cool too.

In terms of grand strategies for tackling the problem, well, those will be in the next post, I swear. For now, I think the main thing is to focus on the technical problem. It's interesting, it's pretty concrete and hopefully the philosophical stuff will just fall out of it when we've got the formal theory down.

In terms of work to build of, there should be plenty. For one thing, it seems like some people in proof theory are interested in similar stuff, the way they frame it is "when are different proofs really the same", but there are reasons to believe that the questions might be intimately related (see Curry-Howard correspondence).

There is the debate about computational functionalism in the philosophy literature, which doesn't look at the technical side of the problem as much, at least not in the way I presented it here (or so I believe), but have considered a lot of the philosophical implications.

Finally, computational complexity theory has obviously done a lot of work that seems to be in this area, since the complexity classes are examples of classifications of programs that distinguish between different programs which compute the same function.

My impression is that a surprising amount of people and areas of thought have been in contact with pieces of this problem, and I think it would be great to try and gather as much as possible of their work in one place. I would like to compile a giant reading list/archive of stuff related to algorithmic similarity, and I hope that some of you reading this post will help by contributing links to work or ideas that you have stumbled upon and think might be relevant.

Another thing that might be useful is to compile a list of problems where for each problem there are multiple algorithms that solve the same problem in what, at least initially, seems to be different ways. I don't have many concrete examples yet but it should be relatively easy to find some. Whether or not people can agree on whether two algorithms are "actually" different or "really" just doing the same thing is another question, I predict that there might be quite a lot of disagreement on that, but that discussion might itself be useful for clarifying some intuitions. Especially interesting would be examples of algorithms having the same asymptotic run-time that nevertheless ran clearly different algorithms.

Here is the link to the reading list, I have got some stuff already, and if you post a suggestion in the comments for something that you think should be in there, then I will add it in:

Related documents:

# Ω 16

New Comment

• Regarding algorithmic similarity. This is an idea I just thought of this moment, so I'm not sure how solid it is, but. Given Turing machines and that compute the same functions, we want to say whether in some sense they do it "in the same way". Let's consider, for any input , the entire histories of intermediate states of the computations and . Call them and . We then say that and are "algorithmically equivalent" when there is a low complexity algorithm that, given access to , can produce any given part of , and vice versa. In particular, the complexity of must be much lower than the complexity of running from the beginning. Here, it seems useful to play with different types of complexity bounds (including time/space for example).

• Regarding waterfalls and human beings. I think that a waterfall is not simulating a human being, because there is no algorithm of simultaneously low description complexity and low computational complexity that can decode a human being from a waterfall. Ofc it is not a binary distinction but a fuzzy distinction (the simpler the decoding algorithm is, the more reasonable it is to say a human being is there).

• Regarding diamond optimizers. I think that the right way to design such an optimizer would be using an instrumental reward function. We then remain with the problem of how to specify this function. We could start with some ontology or class of ontologies that can be reasonably said to contain diamonds, and for which we can define the reward function unambiguously. These ontologies are then mapped into the space of instrumental states, giving us a partial specification of the instrumental reward function (it is specified on the affine span of the images of the ontologies). Then, there is the quesiton of how to extend the reward function to the entire instrumental state space. I wrote a few thoughts about that in the linked essay, but another approach we can take is, considering all extensions that have same range of values. These form a convex set, that can be interpreted as Knightian uncertainty regarding the reward function. We can then consider maximin policies for this set to be "diamond maximizers". In other words, we want the maximizer to be cautious/conservative about judging the number of diamonds on states that lie outside the ontologies.

I definitely think the computational complexity approach is worth looking into, though I think computational complexity behaves kind of weirdly at low complexities.

I like the view that waterfalls are at least a bit conscious! Definitely goes against my own intuition.

I'm a bit worried that whether or not there is a low description complexity and low computational complexity algorithm that decodes a human from a waterfall might depend heavily on how we encode the waterfall as a mathematical object and that although it would be clear for "natural" encodings that it was unlike a human we might need a theory to tell us which encodings are natural and which are not.

Not sure what do you mean by "computational complexity behaves kind of weirdly at low complexities"? In this case, I would be tempted to try the complexity class (logarithmic space complexity).

The most natural encoding is your "qualia", your raw sense data. This still leaves some freedom for how do you represent it, but this freedom has only a very minor effect.

Thanks, this is a good write-up!

Many years ago I wrote my undergraduate thesis on the waterfall problem (though it went by another name to me). Basically, I painstakingly and laboriously transformed an arbitrary human into an arbitrary rock of sufficient size, via a series of imperceptibly tiny steps none of which can be felt by the human. (I did this in imagination, not in reality, to be clear) The point was to see if any of the steps seemed like good places to draw a line and say "Here, consciousness is starting to go out; the system is starting to be less of a person." As a result I became fairly convinced that there aren't any good places to draw the line. So I guess I'm a waterfall apologist now!

Thanks a lot for the link. I'll put it in the reading list (if you don't mind).

I would be interested to hear what you think about the more technical version of the problem. Do you also think that that can have no good solution, or do you think that a solution just won't have the nice philosophical consequences?

Also, I'm excited to know a smart waterfall apologist and if you're up for it I would really like to talk more with you about the argument in your thesis when I have thought about it a bit more.

I'm glad you are interested, and I'd love to hear your thoughts on the paper if you read it. I'd love to talk with you too; just send me an email when you'd like and we can skype or something.

What do you mean by "the more technical version of the problem" exactly?

My take right now is that algorithmic similarity (and instantiation) at least the versions of it relevant for consciousness and decision theory and epistemology will have to be either a brute empirical fact about the world, or a subjective fact about the mind of the agent reasoning about it (like priors and utility functions). What it will not be is some reasonably non-arbitrary property/relation with interesting and useful properties (like nash equilibria, centers of mass, and temperature)

Hey, this is well written.

Out of curiosity, how do you feel about Rice's Theorem?

Thank you!

I hadn't thought about Rice's Theorem in this context before but it makes a lot of sense.

I guess I would say that Rice's Theorem tells us that you can't computably categorize Turing machines based on the functions they describe, but since algorithmic similarity calls for a much finer classification I don't immediately see how it would apply.

And even if we had an impossibility result of this kind, I don't think it would actually be a deal breaker, since we don't need the classification to be computable in general to be enlightening.

Another reason why algorithmic similarity would be useful is related to a recent line of thinking that I've explored recently. Specifically, the question is how we could regularize neural networks in order to make their computations more interpretable. The reason why a theory of algorithmic similarity would help is because we could apply some penalty to a neural network whose internal operations are too dissimilar to some understandable algorithm. This would encourage the neural network to mirror an interpretable computation which makes it easier for us to look inside and see what it's doing.

Ideally, this would provide us the performance gains of neural networks while keeping the interpretability of GOFAI algorithms, like tree search.