Scale invariance is itself an emergent phenomenon.
Imagine scaling something (say a physical law) up - if it changes, it is obviously not scale invariant as it will continue changing with each scale up. If it does not change it has reached a fixed point and will not change in the next scale up either!
Scale invariances are just fixed points of coarse-graining.
Therefore, we should expect anything we think of as scale invariant to break down at small scales. For instance, electric charge is not scale invariant at small scales!
In the opposite direction: We should expect our physical laws to continue holding for the macro scale, if they are fixed points of scaling. This also explains the ubiquity of power laws in the natural sciences; power laws are the only relations that are scale invariant and thus preserved!
All of this may seem tautological but is actually truly strange. To me this indicates that we should expect to be very, very far from the actual substrate of the universe.
Now go forth and study renormalisation group flow! ;)
Epistemic status: Just riffing!
This sounds like a fascinating insight, but I think I may be missing some physics context to fully understand.
Why is it that the derived laws approximating a true underlying physical law are expected to stay scale invariant over increasing scale after being scale invariant for two steps? Is there a reason that there can't be a scale invariant region that goes back to being scale variant at large enough scales just like it does at small enough scales?
The act of coarse-graining/scaling up (RG transformation) changes the theory that describes the system, specifically the theories parameters. If you consider in the space of all theories and iterate the coarse-graining, this induces a flow where each theory is mapped to a coarse-grained version. This flow may posess attractors, that is stable fixed points x*, meaning that when you apply the coarse-graining you get the same theory back.
And if f(x*)=x* then obviously f(f(x*))=x*, i.e. any repeated application will still yield the fixed point.
So you can scale up as much as you want - entering a fixed point really is a one way street, you can can check out any time you like but you can never leave!
The main source of scale-invariance itself probably would have to do with symmetry meaning that an object has a particular property that is preserved across scales.
Space symmetry is an example, where the basic physical laws are preserved across all scales of spacetime, and in particular means that scaling a system down doesn't mean different laws of physics apply at different scales, there is only 1 physical law, which produces varied consequences at all scales.
You're making an interesting connection to symmetry! But scale invariance as discussed here is actually emergent - it arises when theories reach fixed points under coarse-graining, rather than being a fundamental symmetry of space. This is why quantities like electric charge can change with scale, despite spacetime symmetries remaining intact.
And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.
And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.
While this is an interesting idea, I do still think space symmetries are likely to remain fundamental features of physics, rather than being emergent out of some other process.
I'll bet you! ;)
Sadly my claim is somewhat unfalsifiable because the emergence might always be hiding at some smaller scale, but I would be surprised if we find the theory that the standard model emerges from and it's contains classical spacetime.
I did a little search, and if it's worth anything Witten and Wheeler agree: https://www.quantamagazine.org/edward-witten-ponders-the-nature-of-reality-20171128/ (just search for 'emergent' in the article)
Can you have emergent spacetime while space symmetry remains a bedrock fundamental principle, and not emergent of something else?
I don't know if that is a meaningful question.
Consider this: a cube is something that is symmetric under the octahedral group - that's what *makes* it a cube. If it wasn't symmetric under these transformations, it wouldn't be a cube. So also with spacetime - it's something that transforms according to the Poincaré group (plus some other mathematical properties, metric etc.). That's what makes it spacetime.
So space symmetry is always assumed when we talk about spacetime, and if space symmetry didn't hold, spacetime as we know it would not work/exist?
As a corollary: Maybe power laws for AI should not surprise us, they are simply the default outcome of scaling.
Simplified the solomonoff prior is the distribution you get when you take a uniform distribution over all strings and feed them to a turing machine.
Since the outputs are also strings: What happens if we iterate this? What is the stationary distribution? Is there even one? The fixed points will be quines, programs that copy their source code to the output. But how are they weighted? By their length? Presumably you can also have quine-cycles of programs that generate each other in turn, in a manner reminiscent metagenesis. Do these quine cycles capture all probability mass or does some diverge?
Very grateful for answers and literature suggestions.
A few quick observations (each with like confidence; I won't provide detailed arguments atm, but feel free to LW-msg me for more details):
a construction from o3-mini-high: https://colab.research.google.com/drive/1kIGCiDzWT3guCskgmjX5oNoYxsImQre-?usp=sharing ↩︎
Maximally coherent agents are indistinguishable from point particles. They have no internal degrees of freedom, one cannot probe their internal structure from the outside.
Epistemic Status: Unhinged
You know who else is completely simple inside, with no internal degrees of freedom, and always wills the same thing eternally unceasingly?
Yeah that's right, the Medieval Catholic Scholastic God.
In video games this is made literal by every entity having a central coordinate. Their body is merely a shell wrapped around the point-self and a channel for the will of the external power (the player).
Epistemic Status: Riffing
We know coherence when we see it. A craftsman working versus someone constantly fixing his previous mistakes. A functional organization versus bureaucratic churn. A healthy body versus one fighting itself. War, internal conflict, rework: these are wasteful. We respect people who act decisively, societies that build without tearing down, systems that run clean.
This intuition points somewhere real. In some sense, maximizing/expanding coherence is what the universe does: cutting friction, eliminating waste, building systems that don't fight themselves. Not from external design, but because coherent systems expand until they can't. Each pocket of coherence is the universe organizing itself better. The point is that coherence captures "good": low friction, low conflict, no self-sabotage.
I propose that this is measurable. Coherence could be quantified as thermodynamic efficiency. Pick a boundary and time window, track energy in. The coherent part becomes exported work, heat above ambient, or durable stores (raised water, charged batteries, separated materials). The rest is loss: waste heat, rework, reversals. Systems can expand until efficiency stops generating surplus. When new coordination tools raise that limit, growth resumes. Just observable flows, no goals needed.
An interesting coincidency: maximizing thermodynamic efficiency (coherence) maximally delays heat death of a system. Higher efficiency means slower entropy increase.
I am very interested in hearing counterexamples of coherent systems that are intuitively repellent!
So indeed if you define coherence as the negative of arbitrage value.
There is a pretty close relation between thermodynamic free energy and arbitrgable value and degree to which an entity can be money pumped.
You might also be interested in :
https://www.lesswrong.com/posts/3xF66BNSC5caZuKyC/why-subagents
Is there an anthropic reason or computational (solomonoff-pilled) argument for why we would expect to the computational/causal graph of the universe to be this local (sparse)? Or at least appear local to a first approximation. (Bells-inequality)
This seems like a quite special property: I suspect that ether
In physics, it is sometimes asked why there should be just three (large) space dimensions. No one really knows, but there are various mathematical properties unique to three or four dimensions, to which appeal is sometimes made.
I would also consider the recent (last few decades) interest in the emergence of spatial dimensions from entanglement. It may be that your question can be answered by considering these two things together.
Simplicity Priors are Tautological
Any non-uniform prior inherently encodes a bias toward simplicity. This isn't an additional assumption we need to make - it falls directly out of the mathematics.
For any hypothesis , the information content is , which means probability and complexity have an exponential relationship:
This demonstrates that simpler hypotheses (those with lower information content) are automatically assigned higher probabilities. The exponential relationship creates a strong bias toward simplicity without requiring any special mechanisms.
The "simplicity prior" is essentially tautological - more probable things are simple by definition.
You can have a hypothesis with really high kolmogorov complexity, but if the hypothesis is true 50% of the time it will require 1 bit of information to specify with respect to a coding scheme that merely points to cached hypotheses.
This is why when kolmogorov complexity is defined it's with respect to a fixed universal description language, as otherwise you're right, it's vacuous to talk about the simplicity of a hypothesis.
In evolution we can tell a story that not only are genes selected for their function, but also for how easily modifiable they are. For example, having a generic antibiotic gene is much more useful than having an antibiotic locked into one target and far, in edit-distance terms, from any other useful variant.
Why would we expect the generic gene to be more common? There is selection pressure on having modifiable genes because environments are constantly shifting (the Red Queen hypothesis). Genes are modules with evolvability baked in by past selection.
Can we make a similar argument for circuits/features/modes in NNs? Obviously it is better to have a more general circuit, but can we also argue that “multitool circuits” are not only better at generalising but also more likely to be found?
SGD does not optimise loss but rather something like free energy, taking degeneracy (multiplicity) into account with some effective temperature.
But evolvability seems distinct from degeneracy. Degeneracy is a property of a single loss landscape, while evolvability is a claim about distribution shift. And the claim is not “I have low loss in the new distribution” but rather “I am very close to a low-loss solution of the new distribution.”
Degeneracy in ML ≈ mutational robustness in biology, which is straightforward, but that is not what I am pointing at here. Evolvability is closer to out-of-distribution adaptivity: the ability to move quickly into a new optimum with small changes.
Are there experiments where a model is trained on a shifting distribution?
Is the shifting distribution relevant or can this just as well be modeled as a mixture of the distributions, and what we think of as OOD is actually in the mixture distribution? In that case degeneracy is all you need.
Related ideas: cryptographic one-way functions (examples of unevolvable designs), out-of-distribution generalisation, mode connectivity.