LESSWRONG
LW

Matt Dellago
722230
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
1Matthias Dellago's Shortform
9mo
18
Matthias Dellago's Shortform
Matt Dellago4h10

The Red Queen’s Race in Weight Space

In evolution we can tell a story that not only are genes selected for their function, but also for how easily modifiable they are. For example, having a generic antibiotic gene is much more useful than having an antibiotic locked into one target and far, in edit-distance terms, from any other useful variant.

Why would we expect the generic gene to be more common? There is selection pressure on having modifiable genes because environments are constantly shifting (the Red Queen hypothesis). Genes are modules with evolvability baked in by past selection.

Can we make a similar argument for circuits/features/modes in NNs? Obviously it is better to have a more general circuit, but can we also argue that “multitool circuits” are not only better at generalising but also more likely to be found?

SGD does not optimise loss but rather something like free energy, taking degeneracy (multiplicity) into account with some effective temperature.
But evolvability seems distinct from degeneracy. Degeneracy is a property of a single loss landscape, while evolvability is a claim about distribution shift. And the claim is not “I have low loss in the new distribution” but rather “I am very close to a low-loss solution of the new distribution.”

Degeneracy in ML ≈ mutational robustness in biology, which is straightforward, but that is not what I am pointing at here. Evolvability is closer to out-of-distribution adaptivity: the ability to move quickly into a new optimum with small changes.

Are there experiments where a model is trained on a shifting distribution?

Is the shifting distribution relevant or can this just as well be modeled as a mixture of the distributions, and what we think of as OOD is actually in the mixture distribution? In that case degeneracy is all you need.

Related ideas: cryptographic one-way functions (examples of unevolvable designs), out-of-distribution generalisation, mode connectivity.

Reply
The Coding Theorem — A Link between Complexity and Probability
Matt Dellago1mo42

Excellent! Great to have a cleanly formulated article to point people to!

Reply
Thermodynamic entropy = Kolmogorov complexity
Matt Dellago2mo30

Good point! My intuition was that the Berkenstein bound (https://en.wikipedia.org/wiki/Bekenstein_bound) limits the amount of information in a volume. (Or more precisely the information surrounded by an area.) Therefore the number of states in a finite volume is also finite.

I must add: since writing this comment, a man called george pointed out to me that, when modeling the universe as a computation one must take care, to not accidentally derive ontological claims from it.

So today I would have a more 'whatever-works-works'-attitude; UTMs, DFAs both just models, neither likely to be ontologically true.

Reply
Mirror Organisms Are Not Immune to Predation
Matt Dellago4mo30

Wow, thank you for the kind and thorough reply! Obviously there is much more to this, I'll have a look at the report

Reply
Alexander Gietelink Oldenziel's Shortform
Matt Dellago5mo10

I first heard this idea from Joscha Bach, and it is my favorite explanation of free will. I have not heard it called as a 'predictive-generative gap' before though, which is very well formulated imo

Reply
Matthias Dellago's Shortform
Matt Dellago6mo*20

Simplicity Priors are Tautological

Any non-uniform prior inherently encodes a bias toward simplicity. This isn't an additional assumption we need to make - it falls directly out of the mathematics.

For any hypothesis h, the information content is I(h)=−log(P(h)), which means probability and complexity have an exponential relationship: P(h)=e−I(h)

This demonstrates that simpler hypotheses (those with lower information content) are automatically assigned higher probabilities. The exponential relationship creates a strong bias toward simplicity without requiring any special mechanisms.

The "simplicity prior" is essentially tautological - more probable things are simple by definition.

Reply
Thermodynamic entropy = Kolmogorov complexity
Matt Dellago7mo10

I would be interested in seeing those talks, can you maybe share links to these recordings?

Reply
Thermodynamic entropy = Kolmogorov complexity
Matt Dellago7mo10

Very good work, thank you for sharing!

Intuitively speaking, the connection between physics and computability arises because the coarse-grained dynamics of our Universe are believed to have computational capabilities equivalent to a universal Turing machine [19–22].

I can see how this is a reasonable and useful assumption, but the universe seems to be finite in both space and time and therefore not a UTM. What convinced you otherwise?

Reply
Matthias Dellago's Shortform
Matt Dellago7mo10

Thank you! I'll have a look!

Reply
Matthias Dellago's Shortform
Matt Dellago7mo115

Simplified the solomonoff prior is the distribution you get when you take a uniform distribution over all strings and feed them to a turing machine.

Since the outputs are also strings: What happens if we iterate this? What is the stationary distribution? Is there even one? The fixed points will be quines, programs that copy their source code to the output. But how are they weighted? By their length? Presumably you can also have quine-cycles of programs that generate each other in turn, in a manner reminiscent metagenesis. Do these quine cycles capture all probability mass or does some diverge?

Very grateful for answers and literature suggestions.

Reply
Load More
27Mirror Organisms Are Not Immune to Predation
4mo
5
1Matthias Dellago's Shortform
9mo
18