My work happens to consist of two things: writing code and doing math. That means that periodically I produce a very abstract thing, and then observe reality agree with its predictions. While satisfying, it has a common adverse effect of finding oneself in a deep philosophical confusion. Effect so common, that there is even paper about the "Unreasonable Effectiveness of Mathematics."
For some reason, noticing this gap is not something I can forget about and unthink back into non-existence. That leads to not-so-productive time spent roughly from 1 to 5 AM on questions even more abstract and philosophical compared to ones I typically work on. Sometimes it even results in something satisfying enough to go and sleep. This post is the example of such a result. Maybe, if you also tend to circle around philosophical question of why the bits are following the math, and why the reality agrees with bits, you'd find it useful.
The question I inadvertently posed to myself last time was about predictive power. There are things which have it, namely: models, physical laws, theories, etc. They have something in common:
They obviously predict things.
They contain premises/definitions which are pretty abstract, which are used for derivations.
They don't quite "instantiate" the reality (at least to the extent I can see it).
The third point in this list is something which made me develop an intellectual itch over the terminology. We have quite an extensive vocabulary around the things which satisfy first and second points. Formal systems. Models. We can describe relationships between them - we call it isomorphism. We have a pretty good idea of what it means to create an instance of something defined by an abstract system.
But the theories and laws don't quite instantiate the things they apply to. Those things just exist. The rocks just "happen to have the measurable non-zero property which corresponds to what we define in formal system as a mass at rest." The apples just happen to "be countable with Peano axioms." To the extent of my knowledge, apples were countable before Peano, and fell on Earth before Newton.
I am sane enough to not try to comprehend fully what "reality" is, but let's look at this practically. I would appreciate to have at least a term for describing the correspondence of formal systems to the "reality," whatever the "reality" is.
My problem with describing it as "isomorphism to a formal system of reality" is that it requires too much from reality. With all respect to Newton and Peano, I don't think that there is an underlying system cares about their descriptions holding true in all the cases. Moreover - sometimes the descriptions don't hold. We later extended the mass with photons which don't have mass at rest, and then extended them with particle-wave duality, etc. Previous descriptions became outdated.
This "correspondence but not instantiation" has a sibling in software engineering called "duck typing." If something sounds like a duck, behaves like a duck, and has all properties of a duck, we can consider it a duck from external perspective. But duck typing has a descriptive nature, not a formalism or definition of what type the duck typing itself belongs to.
So I spent quite a bit of time thinking about vocabulary which would describe this relationship, which is almost inverse to the instantiation. Professional deformation about finding a common abstraction over the things with similar properties.
Let me pose the question in more grounded terms terms, like "specification" or "blueprint".
We can define a blueprint of a tunnel and build a tunnel according to it. The tunnel will be an instance of this specification. Or it's realization.
We can define a blueprint of a tunnel and another blueprint which will be scaled. Those will have isomorphism.
But what if we define a blueprint and just happened to find a tunnel which satisfies it accidentally? What is this relationship? We don't have the formal isomorphism, or any form of causality.
Well, after crunching through ~30 terms, I can't give a better answer than the tunnel being an example of a blueprint, and the relationship is thus exemplification. If you think it was dumb to spend a few hours on finding the most obvious word - it probably was, and that's exactly how I felt after.
But it rewarded me with some beautiful properties. If we define a formal system such as:
There exists formal system with full properties of formal systems: symbols, formulas, axioms, and inference rules. And for each symbol exists an exemplification class, which can contain the members which satisfy the predicate of behaving in accordance with rules in the formal system.
Or more formally:
Let F be a formal system with:
A set of symbols Σ
A set of well-formed formulas W
A set of axioms A⊆W
Inference rules R
We extend F with an exemplification mappingE:Σ→P(X), where X is some external domain and P(X) is its power set. For each symbol s∈Σ, the exemplification class E(s) contains all members x∈X that satisfy the predicate of behaving in accordance with the operators defined on s in F. //You can notice that I don't specify how to test predicate. From practical perspective I will just observe, that if we decide to check specific apples for “countability“ we can do it. So in this case I assume that the predicate is given in computable form as a part of definition, and will skip implementation details of what this predicate is.
Immediately after this move we can formalize a lot of useful things for free:
More complex systems with more simultaneously applied rules have less than or equal applicability than the simpler ones (trivial, because fewer or equal objects would satisfy the predicates) - has a spirit of Occam's razor.
Popper's criterion can be mapped to finding exceptions/inconsistencies in exemplification, without challenging the formal system itself. We don't need to assume the formalism is wrong; we just find inconsistencies in exemplification relationships, e.g., unmapped operations.
Russell's teapot is just a trivial system which has a single example, and thus can't be falsified based on any tests except the example's existence. And the predicate for testing existence is designed in a way that makes it hard to compute.
This idea requires much less from the universe than following some formal system rules, while keeping the effectiveness of mathematics. The apples, and other things, happen to be countable - nothing more. They quack like countable things and fly like things with measurable mass, if we use the duck typing analogy.
Why can we find such classes and why are they so broad? Well, even if the universe algorithm is not explicitly defined, but the observable universe is just a temporary structure emerged over random noise - the examples of structures matching simpler predicates will be more common. If the universe has an algorithm unknown to us generating examples - there are still more algorithms generating output matching simpler predicates. Even if we assume God exists - well, he only had seven days, no time for complex detailed specifications under such tight deadlines.
If you think that at this moment I was satisfied and my question was resolved... Well, yes, but then I reached an interesting conclusion: the introduction of exemplification and predicates challenges the definition of complexity as naive description length. Here is what I mean:
Let’s take numbers as an example. Natural numbers are described with fewer axioms than complex numbers. If we take a look at generative program length, we might assume that finding examples of complex numbers is harder. But if we take a look at exemplification predicates:
Or, in common words: if the example satisfies Peano, or Rational, or Real, it can be expressed in the formal system of complex numbers, and additionally complex numbers have their own examples which can't be expressed in simpler, less powerful formal systems.
Let's get back to our poor apples, which suffered the most philosophical, theological, and scientific abuse throughout the history of fruits. We could observe that if we perform a ritual sacrifice of slicing them on our knowledge altar, we would suddenly have trouble expressing such an operation in Peano axioms. Did we get more apples? Still one? Where did the pieces come from? Did Peano "break"? Well, for other apples it still holds, and I think that saying that this specific apple doesn't satisfy being an example of the Peano specification is a reasonable compromise with reality. But we can reach for another blueprint of rationals which would describe both untouched apples and the slices produced by our experiment. Then we can think about population of apples produced by this apple, and suddenly we’re in the land of differential analysis, oscilations and complex numbers. But they still describe the “basic apple counting” as good as natural numbers.
So, satisfying the predicate "Complex" is easier. You need to be either Natural, or any of infinitely more examples. In other words: there are more programs which can generate something satisfying the predicate of complex numbers. And this happens despite the Cbeing descriptively more complex. The idea that more complex formal descriptions can relax the constraints for finding examples for them have never crossed my mind before, despite being pretty obvious after saying it. Simpler (more permissive) predicates would accept more generators.
Ok, now we have some sense of "complexity of satisfying predicates," not being the "descriptive complexity," which raises some important questions. Does this contradict Kolmogorov complexity and Solomonoff's theory of inductive inference? If not, how does it interact with those?
Well, I won't pretend I have a good proven answer, but I have an observation which might be useful. When we do math, programming and physics, instead of zeros and ones we regularly face a bunch of uncomputable transcendental numbers. The measure of "program length" assumes we know the correct encoding and have enough expressiveness. It doesn't look to me like the encodings we come up with are good for all the things they try to describe. If they were, I would expect π to be 3, or any other simple string. Instead, we have a pretty long program, which requires indefinite computation to approximate something which we stumbled on after defining a very simple predicate. Our formal systems don't naturally express the properties associated with simple identities.
Take the circle. The most easily described predicate in geometry. It requires just one number - radius/diameter, one operation - equivalence, and one measure - distance:
√x2+y2=r
It's fully symmetrical. It's very compact. And it requires computing a transcendental number to understand the length of line formed by those points. Trivial predicate, property includes transcendental number.
But even more stunning example is e, which provides symmetry for derivatives - the predicate is literally "the rate of growth of the rate of growth stays the same":
ddxex=ex
The most "natural property" requires itself to write the statement needed to define itself. And to compute it we need an infinite series:
ex=∞∑n=0xnn!
The simplest identities describing symmetries give us properties which require resorting to the concept of infinite operations to express them in the system itself. This doesn't sound like a natural way of describing simple properties of simple predicates. So why would we assume that the length/complexity of expression in such a system would have any natural example? Why the unknown constant of "encoding fine" won't dominate it?
The simplest things are symmetric. Removing symmetry from a formal system doesn't make it easier to find an example for it. We would need to construct the predicate in a way that excludes the symmetry. Peano axioms include the symmetry of addition and multiplication, but do not include the symmetry of growth or the symmetry of geometric distance. Would such a system be easier to exemplify from the perspective of e.g. random noise generator?
The easiest thing to do with random noise is to aggregate. This gives us Gaussian via CLT. And interestingly, this object would be symmetrical in both geometry and growth:
f(x)=1σ√2πe−12(x−μσ)2
We didn't require anything except throwing dice enough times and aggregating results. Our generative process despite being this minimalistic produces all the symmetries we described. How much mathematical equilibristics do we need to convert this structure to 0, 1, 2, 3? Even getting to straight line is not that trivial from continuous random dice rolls.
Since we have discovered earlier that complex numbers define the least restrictive predicate, we could conclude that the most common dice rolls would happen over complex plane. The complex Gaussian gains an additional symmetry the real one lacks: full rotational invariance. The real Gaussian is symmetric only under reflection around its mean; the complex Gaussian is symmetric under arbitrary rotation. Its density depends only on the magnitude |z|, not the phase:
f(z)=1πσ2e−|z|2σ2
The real and imaginary components are independent, each Gaussian themselves. By relaxing our predicate from real to complex, we didn't add constraints - we removed one, and a new symmetry emerged. How do we even reach the "natural numbers" from this starting point?
Our null hypothesis should be that the things are random, unrelated, and operations are basic. The existence or relationships, the dependance, the intricacy of operations - is something requiring justification. The null hypotheses should be symmetric, and symmetry is not something to proof. It’s something you get for free from dice rolls and simple transformations.
Less restrictive predicates are ones allowing for more symmetries. And this actually mirrors the evolution of our theories and formal systems - the symmetries we discover allow us to extend the exemplification classes.
And symmetries are not interchangeable. We can't say π=1. We either lose the symmetry of half-turn plane rotation, or the symmetry of the multiplication operation. The same way, 1≠−1≠√−1≠0≠e. These are constants related to symmetries of different operations. And simultaneously they allow us to define the operations in a compact way. We just introduced phase rotation to our random sampling, and it somehow required i. It wouldn't work without it. And all other constants have stayed in place.
This is a strange property of "just a constant" - to be fundamentally linked to an operation class. I honestly struggle to explain why it happens. I can observe that constants tend to "pop up" in identities, and would assume identities are linked to symmetries. But that's a bit speculative without further development.
Speaking of speculative things, I find it a bit suspicious, that the question of "what is the simplest thing emerging from random noise", produces something with operation symmetries similar to quantum mechanics (at least I can see traces of all familiar constants), which should lead to similar exemplification complexity. While the full description of formal systems might differ, their exemplification “classes” should have similar power. And it’s remarkably sane - quantum mechanics being one of the simplest things to exemplify, is the result I would expect from the framework estimating complexity in a way related to reality. The perceived complexity we associate with it is largely an encoding artifact.
You may think that after covering definitions of formal systems, complexity, symmetry, fundamental constants, probability distributions, introducing a few new terms and accidentally stumbling into quantum mechanics territory we'd have a great resolution of the great buildup. Sorry, we won’t. Sometimes the questions remain unanswered. Most of the times they are even poorly stated. Maybe I would be able to write them up more properly next time.
Nevertheless, I found this way of thinking about predictive frameworks useful for me. It doesn't require reality to "follow formal rules," which I assume it won't do.
It never occurred to me that less restrictive predicates allowing more symmetries, would naturally have more examples. I never thought that random noise over the complex plane could emerge as a “natural”, most permissive structure. Or that the symmetries are default, and the asymmetry is the assumption which should be justified.
Intuitively, we would think that “no structure” is equivalent to… nothing. But if “is nothing” is a predicate - it’s one of the most complex to satisfy. There is only one state which is “nothing”. There are infinite states, which are “arbitrary”. The predicate “just true” removes the comparison operation itself. And imagine how big would be the the constant factor if we'd modeled the analog of Kolmogorov complexity of programs on randomly initialized tapes, instead of empty ones...
I was always puzzled why complex to describe abstractions can be ones easier to find the examples for. Why don’t we see that simple formalisms everywhere in physics? The best answer I can offer now is: those lengthy descriptions are the scaffolding to remove the artificial constraints. They uncover the symmetries hidden by our imperfect formal systems. We extend them, so they are easier to exemplify. And the reality rewards us by providing the example.
In the "Unreasonable Effectiveness of Mathematics" the unreasonable part was always us.
My work happens to consist of two things: writing code and doing math. That means that periodically I produce a very abstract thing, and then observe reality agree with its predictions. While satisfying, it has a common adverse effect of finding oneself in a deep philosophical confusion. Effect so common, that there is even paper about the "Unreasonable Effectiveness of Mathematics."
For some reason, noticing this gap is not something I can forget about and unthink back into non-existence. That leads to not-so-productive time spent roughly from 1 to 5 AM on questions even more abstract and philosophical compared to ones I typically work on. Sometimes it even results in something satisfying enough to go and sleep. This post is the example of such a result. Maybe, if you also tend to circle around philosophical question of why the bits are following the math, and why the reality agrees with bits, you'd find it useful.
The question I inadvertently posed to myself last time was about predictive power. There are things which have it, namely: models, physical laws, theories, etc. They have something in common:
The third point in this list is something which made me develop an intellectual itch over the terminology. We have quite an extensive vocabulary around the things which satisfy first and second points. Formal systems. Models. We can describe relationships between them - we call it isomorphism. We have a pretty good idea of what it means to create an instance of something defined by an abstract system.
But the theories and laws don't quite instantiate the things they apply to. Those things just exist. The rocks just "happen to have the measurable non-zero property which corresponds to what we define in formal system as a mass at rest." The apples just happen to "be countable with Peano axioms." To the extent of my knowledge, apples were countable before Peano, and fell on Earth before Newton.
I am sane enough to not try to comprehend fully what "reality" is, but let's look at this practically. I would appreciate to have at least a term for describing the correspondence of formal systems to the "reality," whatever the "reality" is.
My problem with describing it as "isomorphism to a formal system of reality" is that it requires too much from reality. With all respect to Newton and Peano, I don't think that there is an underlying system cares about their descriptions holding true in all the cases. Moreover - sometimes the descriptions don't hold. We later extended the mass with photons which don't have mass at rest, and then extended them with particle-wave duality, etc. Previous descriptions became outdated.
This "correspondence but not instantiation" has a sibling in software engineering called "duck typing." If something sounds like a duck, behaves like a duck, and has all properties of a duck, we can consider it a duck from external perspective. But duck typing has a descriptive nature, not a formalism or definition of what type the duck typing itself belongs to.
So I spent quite a bit of time thinking about vocabulary which would describe this relationship, which is almost inverse to the instantiation. Professional deformation about finding a common abstraction over the things with similar properties.
Let me pose the question in more grounded terms terms, like "specification" or "blueprint".
We can define a blueprint of a tunnel and build a tunnel according to it. The tunnel will be an instance of this specification. Or it's realization.
We can define a blueprint of a tunnel and another blueprint which will be scaled. Those will have isomorphism.
But what if we define a blueprint and just happened to find a tunnel which satisfies it accidentally? What is this relationship? We don't have the formal isomorphism, or any form of causality.
Well, after crunching through ~30 terms, I can't give a better answer than the tunnel being an example of a blueprint, and the relationship is thus exemplification. If you think it was dumb to spend a few hours on finding the most obvious word - it probably was, and that's exactly how I felt after.
But it rewarded me with some beautiful properties. If we define a formal system such as:
Or more formally:
Immediately after this move we can formalize a lot of useful things for free:
This idea requires much less from the universe than following some formal system rules, while keeping the effectiveness of mathematics. The apples, and other things, happen to be countable - nothing more. They quack like countable things and fly like things with measurable mass, if we use the duck typing analogy.
Why can we find such classes and why are they so broad? Well, even if the universe algorithm is not explicitly defined, but the observable universe is just a temporary structure emerged over random noise - the examples of structures matching simpler predicates will be more common. If the universe has an algorithm unknown to us generating examples - there are still more algorithms generating output matching simpler predicates. Even if we assume God exists - well, he only had seven days, no time for complex detailed specifications under such tight deadlines.
If you think that at this moment I was satisfied and my question was resolved... Well, yes, but then I reached an interesting conclusion: the introduction of exemplification and predicates challenges the definition of complexity as naive description length. Here is what I mean:
Let’s take numbers as an example. Natural numbers are described with fewer axioms than complex numbers. If we take a look at generative program length, we might assume that finding examples of complex numbers is harder. But if we take a look at exemplification predicates:
Or, in common words: if the example satisfies Peano, or Rational, or Real, it can be expressed in the formal system of complex numbers, and additionally complex numbers have their own examples which can't be expressed in simpler, less powerful formal systems.
Let's get back to our poor apples, which suffered the most philosophical, theological, and scientific abuse throughout the history of fruits. We could observe that if we perform a ritual sacrifice of slicing them on our knowledge altar, we would suddenly have trouble expressing such an operation in Peano axioms. Did we get more apples? Still one? Where did the pieces come from? Did Peano "break"? Well, for other apples it still holds, and I think that saying that this specific apple doesn't satisfy being an example of the Peano specification is a reasonable compromise with reality. But we can reach for another blueprint of rationals which would describe both untouched apples and the slices produced by our experiment. Then we can think about population of apples produced by this apple, and suddenly we’re in the land of differential analysis, oscilations and complex numbers. But they still describe the “basic apple counting” as good as natural numbers.
So, satisfying the predicate "Complex" is easier. You need to be either Natural, or any of infinitely more examples. In other words: there are more programs which can generate something satisfying the predicate of complex numbers. And this happens despite the Cbeing descriptively more complex. The idea that more complex formal descriptions can relax the constraints for finding examples for them have never crossed my mind before, despite being pretty obvious after saying it. Simpler (more permissive) predicates would accept more generators.
Ok, now we have some sense of "complexity of satisfying predicates," not being the "descriptive complexity," which raises some important questions. Does this contradict Kolmogorov complexity and Solomonoff's theory of inductive inference? If not, how does it interact with those?
Well, I won't pretend I have a good proven answer, but I have an observation which might be useful. When we do math, programming and physics, instead of zeros and ones we regularly face a bunch of uncomputable transcendental numbers. The measure of "program length" assumes we know the correct encoding and have enough expressiveness. It doesn't look to me like the encodings we come up with are good for all the things they try to describe. If they were, I would expect π to be 3, or any other simple string. Instead, we have a pretty long program, which requires indefinite computation to approximate something which we stumbled on after defining a very simple predicate. Our formal systems don't naturally express the properties associated with simple identities.
Take the circle. The most easily described predicate in geometry. It requires just one number - radius/diameter, one operation - equivalence, and one measure - distance:
√x2+y2=rIt's fully symmetrical. It's very compact. And it requires computing a transcendental number to understand the length of line formed by those points. Trivial predicate, property includes transcendental number.
But even more stunning example is e, which provides symmetry for derivatives - the predicate is literally "the rate of growth of the rate of growth stays the same":
ddxex=exThe most "natural property" requires itself to write the statement needed to define itself. And to compute it we need an infinite series:
ex=∞∑n=0xnn!The simplest identities describing symmetries give us properties which require resorting to the concept of infinite operations to express them in the system itself. This doesn't sound like a natural way of describing simple properties of simple predicates. So why would we assume that the length/complexity of expression in such a system would have any natural example? Why the unknown constant of "encoding fine" won't dominate it?
The simplest things are symmetric. Removing symmetry from a formal system doesn't make it easier to find an example for it. We would need to construct the predicate in a way that excludes the symmetry. Peano axioms include the symmetry of addition and multiplication, but do not include the symmetry of growth or the symmetry of geometric distance. Would such a system be easier to exemplify from the perspective of e.g. random noise generator?
The easiest thing to do with random noise is to aggregate. This gives us Gaussian via CLT. And interestingly, this object would be symmetrical in both geometry and growth:
f(x)=1σ√2πe−12(x−μσ)2We didn't require anything except throwing dice enough times and aggregating results. Our generative process despite being this minimalistic produces all the symmetries we described. How much mathematical equilibristics do we need to convert this structure to 0, 1, 2, 3? Even getting to straight line is not that trivial from continuous random dice rolls.
Since we have discovered earlier that complex numbers define the least restrictive predicate, we could conclude that the most common dice rolls would happen over complex plane. The complex Gaussian gains an additional symmetry the real one lacks: full rotational invariance. The real Gaussian is symmetric only under reflection around its mean; the complex Gaussian is symmetric under arbitrary rotation. Its density depends only on the magnitude |z|, not the phase:
f(z)=1πσ2e−|z|2σ2The real and imaginary components are independent, each Gaussian themselves. By relaxing our predicate from real to complex, we didn't add constraints - we removed one, and a new symmetry emerged. How do we even reach the "natural numbers" from this starting point?
Our null hypothesis should be that the things are random, unrelated, and operations are basic. The existence or relationships, the dependance, the intricacy of operations - is something requiring justification. The null hypotheses should be symmetric, and symmetry is not something to proof. It’s something you get for free from dice rolls and simple transformations.
Less restrictive predicates are ones allowing for more symmetries. And this actually mirrors the evolution of our theories and formal systems - the symmetries we discover allow us to extend the exemplification classes.
And symmetries are not interchangeable. We can't say π=1. We either lose the symmetry of half-turn plane rotation, or the symmetry of the multiplication operation. The same way, 1≠−1≠√−1≠0≠e. These are constants related to symmetries of different operations. And simultaneously they allow us to define the operations in a compact way. We just introduced phase rotation to our random sampling, and it somehow required i. It wouldn't work without it. And all other constants have stayed in place.
This is a strange property of "just a constant" - to be fundamentally linked to an operation class. I honestly struggle to explain why it happens. I can observe that constants tend to "pop up" in identities, and would assume identities are linked to symmetries. But that's a bit speculative without further development.
Speaking of speculative things, I find it a bit suspicious, that the question of "what is the simplest thing emerging from random noise", produces something with operation symmetries similar to quantum mechanics (at least I can see traces of all familiar constants), which should lead to similar exemplification complexity. While the full description of formal systems might differ, their exemplification “classes” should have similar power. And it’s remarkably sane - quantum mechanics being one of the simplest things to exemplify, is the result I would expect from the framework estimating complexity in a way related to reality. The perceived complexity we associate with it is largely an encoding artifact.
You may think that after covering definitions of formal systems, complexity, symmetry, fundamental constants, probability distributions, introducing a few new terms and accidentally stumbling into quantum mechanics territory we'd have a great resolution of the great buildup. Sorry, we won’t. Sometimes the questions remain unanswered. Most of the times they are even poorly stated. Maybe I would be able to write them up more properly next time.
Nevertheless, I found this way of thinking about predictive frameworks useful for me. It doesn't require reality to "follow formal rules," which I assume it won't do.
It never occurred to me that less restrictive predicates allowing more symmetries, would naturally have more examples. I never thought that random noise over the complex plane could emerge as a “natural”, most permissive structure. Or that the symmetries are default, and the asymmetry is the assumption which should be justified.
Intuitively, we would think that “no structure” is equivalent to… nothing. But if “is nothing” is a predicate - it’s one of the most complex to satisfy. There is only one state which is “nothing”. There are infinite states, which are “arbitrary”. The predicate “just true” removes the comparison operation itself. And imagine how big would be the the constant factor if we'd modeled the analog of Kolmogorov complexity of programs on randomly initialized tapes, instead of empty ones...
I was always puzzled why complex to describe abstractions can be ones easier to find the examples for. Why don’t we see that simple formalisms everywhere in physics? The best answer I can offer now is: those lengthy descriptions are the scaffolding to remove the artificial constraints. They uncover the symmetries hidden by our imperfect formal systems. We extend them, so they are easier to exemplify. And the reality rewards us by providing the example.
In the "Unreasonable Effectiveness of Mathematics" the unreasonable part was always us.