534

LESSWRONG
LW

533

Matthias Dellago's Shortform

by Matt Dellago
18th Dec 2024
1 min read
26

1

This is a special post for quick takes by Matt Dellago. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Matthias Dellago's Shortform
12Matt Dellago
3Nathan Helm-Burger
4Matt Dellago
3Noosphere89
1Matt Dellago
3Noosphere89
1Matt Dellago
3Noosphere89
1Matt Dellago
2Noosphere89
3Matt Dellago
11Matt Dellago
6Kaarel
3TsviBT
1Matt Dellago
10Matt Dellago
101a3orn
2Jesper L.
4testingthewaters
3Matt Dellago
4Alexander Gietelink Oldenziel
3Matt Dellago
3Mitchell_Porter
2Matt Dellago
2Alex Gibson
1Matt Dellago
26 comments, sorted by
top scoring
Click to highlight new comments since: Today at 10:18 PM
[-]Matt Dellago10mo*121

Scale invariance is itself an emergent phenomenon. 

Imagine scaling something (say a physical law) up - if it changes, it is obviously not scale invariant as it will continue changing with each scale up. If it does not change it has reached a fixed point and will not change in the next scale up either!
Scale invariances are just fixed points of coarse-graining.
Therefore, we should expect anything we think of as scale invariant to break down at small scales. For instance, electric charge is not scale invariant at small scales! 
In the opposite direction: We should expect our physical laws to continue holding for the macro scale, if they are fixed points of scaling. This also explains the ubiquity of power laws in the natural sciences; power laws are the only relations that are scale invariant and thus preserved! 
All of this may seem tautological but is actually truly strange. To me this indicates that we should expect to be very, very far from the actual substrate of the universe. 

Now go forth and study renormalisation group flow! ;)

Epistemic status: Just riffing!

Reply2
[-]Nathan Helm-Burger10mo30

This sounds like a fascinating insight, but I think I may be missing some physics context to fully understand.

Why is it that the derived laws approximating a true underlying physical law are expected to stay scale invariant over increasing scale after being scale invariant for two steps? Is there a reason that there can't be a scale invariant region that goes back to being scale variant at large enough scales just like it does at small enough scales?

Reply
[-]Matt Dellago10mo40

The act of coarse-graining/scaling up (RG transformation) changes the theory that describes the system, specifically the theories parameters. If you consider in the space of all theories and iterate the coarse-graining, this induces a flow where each theory is mapped to a coarse-grained version. This flow may posess attractors, that is stable fixed points x*, meaning that when you apply the coarse-graining you get the same theory back.

And if f(x*)=x* then obviously f(f(x*))=x*, i.e. any repeated application will still yield the fixed point.

So you can scale up as much as you want - entering a fixed point really is a one way street, you can can check out any time you like but you can never leave!

Reply1
[-]Noosphere8910mo30

The main source of scale-invariance itself probably would have to do with symmetry meaning that an object has a particular property that is preserved across scales.

Space symmetry is an example, where the basic physical laws are preserved across all scales of spacetime, and in particular means that scaling a system down doesn't mean different laws of physics apply at different scales, there is only 1 physical law, which produces varied consequences at all scales.

Reply
[-]Matt Dellago10mo10

You're making an interesting connection to symmetry! But scale invariance as discussed here is actually emergent - it arises when theories reach fixed points under coarse-graining, rather than being a fundamental symmetry of space. This is why quantities like electric charge can change with scale, despite spacetime symmetries remaining intact.

And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.

Reply
[-]Noosphere8910mo30

And while spacetime symmetries still seem scale invariant, considering the above argument they might also break down at small scales. It seems exceedingly unlikely that they would not! The initial parameters of the theory would have to be chosen just so as to be a fixed point. It seems much more likely that these symmetries emerged through RG flow rather than being fundamental.

While this is an interesting idea, I do still think space symmetries are likely to remain fundamental features of physics, rather than being emergent out of some other process.

Reply
[-]Matt Dellago10mo10

I'll bet you! ;)

Sadly my claim is somewhat unfalsifiable because the emergence might always be hiding at some smaller scale, but I would be surprised if we find the theory that the standard model emerges from and it's contains classical spacetime.

I did a little search, and if it's worth anything Witten and Wheeler agree: https://www.quantamagazine.org/edward-witten-ponders-the-nature-of-reality-20171128/ (just search for 'emergent' in the article)

Reply
[-]Noosphere8910mo30

Can you have emergent spacetime while space symmetry remains a bedrock fundamental principle, and not emergent of something else?

Reply
[-]Matt Dellago10mo10

I don't know if that is a meaningful question.
Consider this: a cube is something that is symmetric under the octahedral group - that's what *makes* it a cube. If it wasn't symmetric under these transformations, it wouldn't be a cube. So also with spacetime - it's something that transforms according to the Poincaré group (plus some other mathematical properties, metric etc.). That's what makes it spacetime.

Reply
[-]Noosphere8910mo20

So space symmetry is always assumed when we talk about spacetime, and if space symmetry didn't hold, spacetime as we know it would not work/exist?

Reply
[-]Matt Dellago10mo*30

As a corollary: Maybe power laws for AI should not surprise us, they are simply the default outcome of scaling.

Reply
[-]Matt Dellago8mo115

Simplified the solomonoff prior is the distribution you get when you take a uniform distribution over all strings and feed them to a turing machine.

Since the outputs are also strings: What happens if we iterate this? What is the stationary distribution? Is there even one? The fixed points will be quines, programs that copy their source code to the output. But how are they weighted? By their length? Presumably you can also have quine-cycles of programs that generate each other in turn, in a manner reminiscent metagenesis. Do these quine cycles capture all probability mass or does some diverge?

Very grateful for answers and literature suggestions.

Reply
[-]Kaarel8mo*61

A few quick observations (each with like 90% confidence; I won't provide detailed arguments atm, but feel free to LW-msg me for more details):

  • Any finite number of iterates just gives you the solomonoff distribution up to at most a const multiplicative difference (with the const depending on how many iterates you do). My other points will be about the limit as we iterate many times.
  • The quines will have mass at least their prior, upweighted by some const because of programs which do not produce an infinite output string. They will generally have more mass than that, and some will gain mass by a larger multiplicative factor than others, but idk how to say something nice about this further.
  • Yes, you can have quine-cycles. Relevant tho not exactly this: https://github.com/mame/quine-relay
  • As you do more and more iterates, there's not convergence to a stationary distribution, at least in total variation distance. One reason is that you can write a quine which adds a string to itself (and then adds the same string again next time, and so on)[1], creating "a way for a finite chunk of probability to escape to infinity". So yes, some mass diverges.
  • Quine-cycles imply (or at least very strongly suggest) probabilities also do not converge pointwise.
  • What about pointwise convergence when we also average over the number of iterates? It seems plausible you get convergence then, but not sure (and not sure if this would be an interesting claim). It would be true if we could somehow think of the problem as living on a directed graph with countably many vertices, but idk how to do that atm.
  • There are many different stationary distributions — e.g. you could choose any distribution on the quines.

  1. a construction from o3-mini-high: https://colab.research.google.com/drive/1kIGCiDzWT3guCskgmjX5oNoYxsImQre-?usp=sharing ↩︎

Reply
[-]TsviBT8mo30

Very relevant: https://web.archive.org/web/20090608111223/http://www.paul-almond.com/WhatIsALowLevelLanguage.htm

Reply
[-]Matt Dellago8mo10

Thank you! I'll have a look!

Reply
[-]Matt Dellago2d100

Maximally coherent agents are indistinguishable from point particles. They have no internal degrees of freedom, one cannot probe their internal structure from the outside.

Epistemic Status: Unhinged

Reply
[-]1a3orn2d101

You know who else is completely simple inside, with no internal degrees of freedom, and always wills the same thing eternally unceasingly?

Yeah that's right, the Medieval Catholic Scholastic God.

Reply2
[-]Jesper L.2d20

This is not a trivial point to make. Good callout.

Reply
[-]testingthewaters2d40

In video games this is made literal by every entity having a central coordinate. Their body is merely a shell wrapped around the point-self and a channel for the will of the external power (the player).

Reply
[-]Matt Dellago10d*30

Coherence as Purpose

Epistemic Status: Riffing

We know coherence when we see it. A craftsman working versus someone constantly fixing his previous mistakes. A functional organization versus bureaucratic churn. A healthy body versus one fighting itself. War, internal conflict, rework: these are wasteful. We respect people who act decisively, societies that build without tearing down, systems that run clean.

This intuition points somewhere real. In some sense, maximizing/expanding coherence is what the universe does: cutting friction, eliminating waste, building systems that don't fight themselves. Not from external design, but because coherent systems expand until they can't. Each pocket of coherence is the universe organizing itself better. The point is that coherence captures "good": low friction, low conflict, no self-sabotage.

I propose that this is measurable. Coherence could be quantified as thermodynamic efficiency. Pick a boundary and time window, track energy in. The coherent part becomes exported work, heat above ambient, or durable stores (raised water, charged batteries, separated materials). The rest is loss: waste heat, rework, reversals. Systems can expand until efficiency stops generating surplus. When new coordination tools raise that limit, growth resumes. Just observable flows, no goals needed.

An interesting coincidency: maximizing thermodynamic efficiency (coherence) maximally delays heat death of a system. Higher efficiency means slower entropy increase.

I am very interested in hearing counterexamples of coherent systems that are intuitively repellent!

Reply
[-]Alexander Gietelink Oldenziel10d40

So indeed if you define coherence as the negative of arbitrage value.

There is a pretty close relation between thermodynamic free energy and arbitrgable value and degree to which an entity can be money pumped.

You might also be interested in :

https://www.lesswrong.com/posts/3xF66BNSC5caZuKyC/why-subagents

Reply
[-]Matt Dellago24d30

Is there an anthropic reason or computational (solomonoff-pilled) argument for why we would expect to the computational/causal graph of the universe to be this local (sparse)? Or at least appear local to a first approximation. (Bells-inequality)

This seems like a quite special property: I suspect that ether

  • it is not as rare in e.g. the solomonoff prior as we might first intuit, or
  • we should expect this for anthropic resons e.g. it is really hard to develop intelligence/do precidctions in nonlocal universes.
Reply
[-]Mitchell_Porter24d30

In physics, it is sometimes asked why there should be just three (large) space dimensions. No one really knows, but there are various mathematical properties unique to three or four dimensions, to which appeal is sometimes made. 

I would also consider the recent (last few decades) interest in the emergence of spatial dimensions from entanglement. It may be that your question can be answered by considering these two things together. 

Reply11
[-]Matt Dellago7mo*20

Simplicity Priors are Tautological

Any non-uniform prior inherently encodes a bias toward simplicity. This isn't an additional assumption we need to make - it falls directly out of the mathematics.

For any hypothesis h, the information content is I(h)=−log(P(h)), which means probability and complexity have an exponential relationship: P(h)=e−I(h)

This demonstrates that simpler hypotheses (those with lower information content) are automatically assigned higher probabilities. The exponential relationship creates a strong bias toward simplicity without requiring any special mechanisms.

The "simplicity prior" is essentially tautological - more probable things are simple by definition.

Reply
[-]Alex Gibson7mo20

You can have a hypothesis with really high kolmogorov complexity, but if the hypothesis is true 50% of the time it will require 1 bit of information to specify with respect to a coding scheme that merely points to cached hypotheses.

This is why when kolmogorov complexity is defined it's with respect to a fixed universal description language, as otherwise you're right, it's vacuous to talk about the simplicity of a hypothesis.

Reply
[-]Matt Dellago2mo10

The Red Queen’s Race in Weight Space

In evolution we can tell a story that not only are genes selected for their function, but also for how easily modifiable they are. For example, having a generic antibiotic gene is much more useful than having an antibiotic locked into one target and far, in edit-distance terms, from any other useful variant.

Why would we expect the generic gene to be more common? There is selection pressure on having modifiable genes because environments are constantly shifting (the Red Queen hypothesis). Genes are modules with evolvability baked in by past selection.

Can we make a similar argument for circuits/features/modes in NNs? Obviously it is better to have a more general circuit, but can we also argue that “multitool circuits” are not only better at generalising but also more likely to be found?

SGD does not optimise loss but rather something like free energy, taking degeneracy (multiplicity) into account with some effective temperature.
But evolvability seems distinct from degeneracy. Degeneracy is a property of a single loss landscape, while evolvability is a claim about distribution shift. And the claim is not “I have low loss in the new distribution” but rather “I am very close to a low-loss solution of the new distribution.”

Degeneracy in ML ≈ mutational robustness in biology, which is straightforward, but that is not what I am pointing at here. Evolvability is closer to out-of-distribution adaptivity: the ability to move quickly into a new optimum with small changes.

Are there experiments where a model is trained on a shifting distribution?

Is the shifting distribution relevant or can this just as well be modeled as a mixture of the distributions, and what we think of as OOD is actually in the mixture distribution? In that case degeneracy is all you need.

Related ideas: cryptographic one-way functions (examples of unevolvable designs), out-of-distribution generalisation, mode connectivity.

Reply
Moderation Log
More from Matt Dellago
View more
Curated and popular this week
26Comments