Matthias Dellago's Shortform

18th Dec 2024

1 min read

1

This is a special post for quick takes by Matt Dellago. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

25 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:56 AM

[-]Matt Dellago2mo12-2

Maximally coherent agents are indistinguishable from point particles. They have no internal degrees of freedom, one cannot probe their internal structure from the outside.

Epistemic Status: Unhinged

[-]1a3orn2mo121

You know who else is completely simple inside, with no internal degrees of freedom, and always wills the same thing eternally unceasingly?

Yeah that's right, the Medieval Catholic Scholastic God.

[-]Jesper L.2mo30

This is not a trivial point to make. Good callout.

[-]testingthewaters2mo40

In video games this is made literal by every entity having a central coordinate. Their body is merely a shell wrapped around the point-self and a channel for the will of the external power (the player).

[-]Matt Dellago1y*121

Scale invariance is itself an emergent phenomenon.

Imagine scaling something (say a physical law) up - if it changes, it is obviously not scale invariant as it will continue changing with each scale up. If it does not change it has reached a fixed point and will not change in the next scale up either!
Scale invariances are just fixed points of coarse-graining.
Therefore, we should expect anything we think of as scale invariant to break down at small scales. For instance, electric charge is not scale invariant at small scales!
In the opposite direction: We should expect our physical laws to continue holding for the macro scale, if they are fixed points of scaling. This also explains the ubiquity of power laws in the natural sciences; power laws are the only relations that are scale invariant and thus preserved!
All of this may seem tautological but is actually truly strange. To me this indicates that we should expect to be very, very far from the actual substrate of the universe.

Now go forth and study renormalisation group flow! ;)

Epistemic status: Just riffing!

[-]Nathan Helm-Burger1y30

This sounds like a fascinating insight, but I think I may be missing some physics context to fully understand.

Why is it that the derived laws approximating a true underlying physical law are expected to stay scale invariant over increasing scale after being scale invariant for two steps? Is there a reason that there can't be a scale invariant region that goes back to being scale variant at large enough scales just like it does at small enough scales?

[-]Matt Dellago1y40

The act of coarse-graining/scaling up (RG transformation) changes the theory that describes the system, specifically the theories parameters. If you consider in the space of all theories and iterate the coarse-graining, this induces a flow where each theory is mapped to a coarse-grained version. This flow may posess attractors, that is stable fixed points x*, meaning that when you apply the coarse-graining you get the same theory back.

And if f(x*)=x* then obviously f(f(x*))=x*, i.e. any repeated application will still yield the fixed point.

So you can scale up as much as you want - entering a fixed point really is a one way street, you can can check out any time you like but you can never leave!

[-]Noosphere891y30

The main source of scale-invariance itself probably would have to do with symmetry meaning that an object has a particular property that is preserved across scales.

Space symmetry is an example, where the basic physical laws are preserved across all scales of spacetime, and in particular means that scaling a system down doesn't mean different laws of physics apply at different scales, there is only 1 physical law, which produces varied consequences at all scales.

[-]Matt Dellago1y10

You're making an interesting connection to symmetry! But scale invariance as discussed here is actually emergent - it arises when theories reach fixed points under coarse-graining, rather than being a fundamental symmetry of space. This is why quantities like electric charge can change with scale, despite spacetime symmetries remaining intact.

And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.

[-]Noosphere891y30

And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.

While this is an interesting idea, I do still think space symmetries are likely to remain fundamental features of physics, rather than being emergent out of some other process.

[-]Matt Dellago1y10

I'll bet you! ;)

Sadly my claim is somewhat unfalsifiable because the emergence might always be hiding at some smaller scale, but I would be surprised if we find the theory that the standard model emerges from and it's contains classical spacetime.

I did a little search, and if it's worth anything Witten and Wheeler agree: https://www.quantamagazine.org/edward-witten-ponders-the-nature-of-reality-20171128/ (just search for 'emergent' in the article)

[-]Noosphere891y30

Can you have emergent spacetime while space symmetry remains a bedrock fundamental principle, and not emergent of something else?

[-]Matt Dellago1y10

I don't know if that is a meaningful question.
Consider this: a cube is something that is symmetric under the octahedral group - that's what *makes* it a cube. If it wasn't symmetric under these transformations, it wouldn't be a cube. So also with spacetime - it's something that transforms according to the Poincaré group (plus some other mathematical properties, metric etc.). That's what makes it spacetime.

[-]Noosphere891y20

So space symmetry is always assumed when we talk about spacetime, and if space symmetry didn't hold, spacetime as we know it would not work/exist?

[-]Matt Dellago1y*30

As a corollary: Maybe power laws for AI should not surprise us, they are simply the default outcome of scaling.

[-]Matt Dellago10mo115

Simplified the solomonoff prior is the distribution you get when you take a uniform distribution over all strings and feed them to a turing machine.

Since the outputs are also strings: What happens if we iterate this? What is the stationary distribution? Is there even one? The fixed points will be quines, programs that copy their source code to the output. But how are they weighted? By their length? Presumably you can also have quine-cycles of programs that generate each other in turn, in a manner reminiscent metagenesis. Do these quine cycles capture all probability mass or does some diverge?

Very grateful for answers and literature suggestions.

[-]Kaarel10mo*61

A few quick observations (each with like confidence; I won't provide detailed arguments atm, but feel free to LW-msg me for more details):

Any finite number of iterates just gives you the solomonoff distribution up to at most a const multiplicative difference (with the const depending on how many iterates you do). My other points will be about the limit as we iterate many times.
The quines will have mass at least their prior, upweighted by some const because of programs which do not produce an infinite output string. They will generally have more mass than that, and some will gain mass by a larger multiplicative factor than others, but idk how to say something nice about this further.
Yes, you can have quine-cycles. Relevant tho not exactly this: https://github.com/mame/quine-relay
As you do more and more iterates, there's not convergence to a stationary distribution, at least in total variation distance. One reason is that you can write a quine which adds a string to itself (and then adds the same string again next time, and so on)^[1], creating "a way for a finite chunk of probability to escape to infinity". So yes, some mass diverges.
Quine-cycles imply (or at least very strongly suggest) probabilities also do not converge pointwise.
What about pointwise convergence when we also average over the number of iterates? It seems plausible you get convergence then, but not sure (and not sure if this would be an interesting claim). It would be true if we could somehow think of the problem as living on a directed graph with countably many vertices, but idk how to do that atm.
There are many different stationary distributions — e.g. you could choose any distribution on the quines.

a construction from o3-mini-high: https://colab.research.google.com/drive/1kIGCiDzWT3guCskgmjX5oNoYxsImQre-?usp=sharing ↩︎

[-]TsviBT10mo30

Very relevant: https://web.archive.org/web/20090608111223/http://www.paul-almond.com/WhatIsALowLevelLanguage.htm

[-]Matt Dellago10mo10

Thank you! I'll have a look!

[-]Matt Dellago2mo*30

Coherence as Purpose

Epistemic Status: Riffing

We know coherence when we see it. A craftsman working versus someone constantly fixing his previous mistakes. A functional organization versus bureaucratic churn. A healthy body versus one fighting itself. War, internal conflict, rework: these are wasteful. We respect people who act decisively, societies that build without tearing down, systems that run clean.

This intuition points somewhere real. In some sense, maximizing/expanding coherence is what the universe does: cutting friction, eliminating waste, building systems that don't fight themselves. Not from external design, but because coherent systems expand until they can't. Each pocket of coherence is the universe organizing itself better. The point is that coherence captures "good": low friction, low conflict, no self-sabotage.

I propose that this is measurable. Coherence could be quantified as thermodynamic efficiency. Pick a boundary and time window, track energy in. The coherent part becomes exported work, heat above ambient, or durable stores (raised water, charged batteries, separated materials). The rest is loss: waste heat, rework, reversals. Systems can expand until efficiency stops generating surplus. When new coordination tools raise that limit, growth resumes. Just observable flows, no goals needed.

An interesting coincidency: maximizing thermodynamic efficiency (coherence) maximally delays heat death of a system. Higher efficiency means slower entropy increase.

I am very interested in hearing counterexamples of coherent systems that are intuitively repellent!

Edit: Had a talk with a physicist: This is in fact the same as the system minimizing entropy production rate! Possibly that this could serve as a more operationally tractable (and fundamental) foundation to agency, as opposed to beliefs, goals, actions, utility etc. An a structure that, if present in a system, minimizes the rate of entropy production. i.e. maximally slows neg-entropy consumption.

[-]Alexander Gietelink Oldenziel2mo40

So indeed if you define coherence as the negative of arbitrage value.

There is a pretty close relation between thermodynamic free energy and arbitrgable value and degree to which an entity can be money pumped.

The Red Queen’s Race in Weight Space

In evolution we can tell a story that not only are genes selected for their function, but also for how easily modifiable they are. For example, having a generic antibiotic gene is much more useful than having an antibiotic locked into one target and far, in edit-distance terms, from any other useful variant.

Why would we expect the generic gene to be more common? There is selection pressure on having modifiable genes because environments are constantly shifting (the Red Queen hypothesis). Genes are modules with evolvability baked in by past selection.

Can we make a similar argument for circuits/features/modes in NNs? Obviously it is better to have a more general circuit, but can we also argue that “multitool circuits” are not only better at generalising but also more likely to be found?

SGD does not optimise loss but rather something like free energy, taking degeneracy (multiplicity) into account with some effective temperature.
But evolvability seems distinct from degeneracy. Degeneracy is a property of a single loss landscape, while evolvability is a claim about distribution shift. And the claim is not “I have low loss in the new distribution” but rather “I am very close to a low-loss solution of the new distribution.”

Degeneracy in ML ≈ mutational robustness in biology, which is straightforward, but that is not what I am pointing at here. Evolvability is closer to out-of-distribution adaptivity: the ability to move quickly into a new optimum with small changes.

Are there experiments where a model is trained on a shifting distribution?

Is the shifting distribution relevant or can this just as well be modeled as a mixture of the distributions, and what we think of as OOD is actually in the mixture distribution? In that case degeneracy is all you need.

Related ideas: cryptographic one-way functions (examples of unevolvable designs), out-of-distribution generalisation, mode connectivity.

Moderation Log

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

Matthias Dellago's Shortform

1

Coherence as Purpose

The Red Queen’s Race in Weight Space