I'd like to introduce a mathematical definition that tries to capture some of our intuitive notion of 'complexity'. People use this word in lots of ways. What I mean by this word is the sort of 'complexity' that seems to have been increasing over the history of our planet for the past few billion years. By example:

--Simple self-replicators are more complex than a random sea of chemicals

--Single-celled bacteria are more complex than simple replicators

--Multicellular life is more complex than single-celled life

--Hunter-gatherer bands are more complex than most other life forms

--Modern civilizations are more complex than hunter-gatherer bands

This definition would be in terms of the underlying computer program of physics, and in the style of Kolmogorov complexity. Kolmogorov complexity itself doesn't work, because it seems that our entire universe can be generated by a short computer program, namely the laws of physics running on some simple initial state. (Yes, our universe is in fact quantum-mechanical and hence randomized. But it's still seems likely that we can sample from the distribution of possible worlds with a short program) Since our universe can be generated by a short program, it seems that any sort of 'complexity' that can be found must lie in something besides the underlying program itself.

Now, let's try to construct a formal model. We'll consider some sort of computable dynamical system with size N. For concreteness let's imagine it's a (randomized) cellular automaton. Then running it for T steps should take O(NT) steps of computation.

We're trying to predict the evolution of the system with limited computational resources. We'll build a randomized Turing machine and only let it run for m timesteps. We'll score our predictor on the summed KL divergence of its predictions at each timestep and the true distribution at that step.

Suppose we are using the best possible predictor for a given runtime. The entire history of the universe can be run in time NT simply by running physics. Then our KL divergence at each timestep will be zero.

Conversely, if we are given no time at all the best we can do is output the uniform distribution at each timestep. the KL divergence will be 1 - o(T), pretty close to 1 at each timestep. (Assuming the system continues to change at least somewhat over the course of the evolution)

In between, how will our predictors fare? That depends on the structure of the underlying program, the initial state, and the details of the trajectory.

At one extreme, imagine a dynamical system that is completely pseudorandom. There is no way to predict the output of the system short of running it. Then our predictors wouldn't start to perform very well until their runtime was actually on the order of NT:

At another extreme, picture an extremely simple system. For example, perhaps the initial state is such that subsequent states simply alternate in a cycle of length 2 forever. Then we can get perfect accuracy with programs of runtime just O(log T + N)

Something like our world seems a bit subtler. Initially you can model things pretty well with some sort of simple chemistry simulation. Getting into the finer details of the movement of the chemicals, there isn't really any structure to it, you just have to follow each individual particle. Later, life arises, and you can start to get a bit more detail. You can predict that fit organisms will survive and those that are unfit will die, and can predict the behavior of each organism. When humans arise, you can start to predict things based on consequentialism and social dynamics, which are more subtle. And so on:

Here, then, is my candidate definition of complexity. It is how broadly spread out the 'dips' in the graphs above are. How many layers of structure can be uncovered by 'thinking' longer and longer about the evolution. In both the pseudorandom and simple cases, there is really only one layer of structure to be considered. But in complex time-histories, like that of our world, there are many layers, like an onion. Further, the number of layers that can be usefully considered will increase with time.

Formally, the negative derivative of the divergence defines a probability distribution over the interval [O(N), O(NT)]. Let's say that the complexity is the mean absolute difference of this distribution(perhaps with the x-axis scaled logarithmically).

I realize that it is not obvious that this definition would in fact predict higher complexity for the example cases I talked about earlier. However, I think it's plausible that it would, and it at least is not obviously ill-fitted for this purpose like Kolmogorov complexity.

There are some subtleties to address. It seems like it might be necessary to add some constraint on the program length to prevent simply encoding the entire evolution in the body of the program.

The way I understand it: complexity is the number of distinctive levels, on which we could create better and better model of the phenomenon. Is it right?

Neat idea! Even if we don't try to call this a necessary and sufficient definition of complexity (which is something of a mug's game), it's definitely something comlexity-ish that we don't have a good formalization for. I promise to comment with something more interesting tomorrow.

I would like to question whether the intuitive concept of complexity makes sense. In what ways is human civilization more complex than a puddle of water? Aliens watching us through their telescopes may observe our effects on the planet and find them fairly simple and predictable in terms of the ways we affect the atmosphere as we climb our tech tree, or even in terms of how we will affect the solar system in the sci-fi future. At the same time, a scientist looking at a dozen molecules of water might find their movement highly complex and explainable by beautiful abstractions. I propose that the real reason we find people more complex than water is because we happen to care about people.

It's not that there's some true essence of complexity, which humans are failing to reach due to provincial interests. It's that we define the word "complexity" the way we do because of those provincial interests and pragmatic concerns, and there's no true essence at all.

If aliens use the word "xlorp," and say that humans and puddles of water are about equally xlorp, and suppose we translate "xlorp" as "complex", then what's happening is not that aliens have a different perspective on the true essence of complexity, it's that humans and aliens are using the word complexity to refer to different things.

That seems to be a rather general response that doesn't feel very relevant to my point. Anyway, if you agree that the human intuition of complexity depends on "provincial interests" I was trying to point out, then you should also agree that OP doesn't reflect those interests in his complexity measure, right?

Also, some concepts are more natural than others. If we agree that the intuitive complexity is not very natural, we may still want to model it for some purposes, but it also makes sense to abandon it in favor of a more natural concept.

One related idea of complexity is averaged fractal dimension over some range of scales. We might measure a 2d figure's complexity (embedded in 3d) by how its box-counting dimension is greater than 2 over some relevant range of scales. But extending this analogy to models might require a better understanding of the average case of runtime / storage of the model vs. predictive accuracy in various environments (significantly better than that contained in this comment).

If we think of the environment as a turing machine of length (but not K-complexity, because simple models provide a path to reduce K-complexity) E, and the model as a turing machine of length M, M<E, then the average case is actually pretty simple, because things are not on average compressible. The average best model accuracy will (I think) just be around M/E. Things are modelable to the the extent that you can push your accuracy curve above this line. But this seems a little too centered on agent perception to be what you want.

Another thought is that one of your desiderata for application appears to be an interaction with local complexity - the universe may be K-simple, but parts of the universe will be K-complex. But it seems like you somehow want to start with the description of a universe and immediately have some quantification of how K-complex parts of it will be.

(Epistemic status: gut reaction to the beginning of the post; thought out during writing)

It seems that a useful measure of complexity might arise from thinking of a phenomenon as a causal graph; specifically, how complex a phenomenon is can be described as "What fraction of causal components of the phenomenon need to break before the phenomenon ceases to occur, penalized by the complexity of the components."

This has the benefit of being able to talk about a phenomenon at multiple levels of precision (with its components at multiple levels of abstraction) and still get useful bounds on the "complexity" of the phenomenon. It also has the benefit of matching the following intuitions:

It also has the following less intuitive properties:

Is there prior work on such a model of complexity?

Promoted to the frontpage