Physicist ('symbol gremlin') by training, currently a MATS scholar studying the connection between the structure of natural data and the structure of learned computations.
Currently training away an aversion to sharing my writing/thoughts publicly - please modulate tone of comments accordingly :)
Yup I definitely agree there's no special role for unicellular attackers - I was eliding the complexity for brevity. I think the asymmetry still broadly holds meaningfully - e.g. multicellular parasites are very complex attackers but have much longer generation-times (I assume?), so they too trade off online vs offline optimization bits. Nonetheless the host organism still has more complexity to draw on for most things with which the immune system is concerned.
Interesting to think about the pareto frontier of offline vs online optimization. The multicellular parasites and unicellular microbes would be paradigm examples. But the microbiome gives lie to this idea - it is complex and organized but highly adaptive still because selection can act on the lower level. Perhaps being ~commensal/mutual instead of adversarial is related? I don't know.
The linked Claude conversation doesn't share the markdown file unfortunately. Apologies. Here is a gdrive link https://drive.google.com/file/d/1wpPGI7poP04ZMDoPU_lEh8D1CI8kOtic/view?usp=sharing I read it and it was a good introduction but it did a mediocre job of reframing things 'as an optimizer' or even 'as a control system'.
Note: Skimming, Claude hallucinates what Alon's 'periodic table of diseases' is. He has a pretty good youtube video on it you can watch instead. https://www.youtube.com/watch?v=ZMz_C778WMY&pp=ygUeYWxvbiBwZXJpb2RpYyB0YWJsZSBvZiBkaXNlYXNl
The Immune System as Anti-Optimizer
We have a short list of systems we like to call "optimizers" — the market, natural selection, human design, superintelligence. I think we ought to hold the immune system in comparable regard; I'm essentially ignorant of immunobiology beyond a few YouTube videos (perhaps a really fantastic LW sequence exists of which I am unaware), but here's why I am thinking this.
The immune system is the archetypal anti-optimizer: it defends a big multicellular organism from rapidly evolving microbiota. The key asymmetry:
In short: the immune system embodies enough amortized optimization power to defend against online adversarial attacks by natural selection, because these attacks are constrained by the comparative simplicity of the attackers. One optimizer constrains another, faster and more adaptive optimizer, by having more resources.
What makes this especially interesting is that the immune system has no discernible volition. It is complex — probably far more so than I appreciate — but intuitively much more like a thermostat than a scheming eldritch god. It optimizes powerfully, within bounds that feel legible and non-agential.
I will not be so crass as to say "big if true for alignment", but you are permitted to infer this if it please you. I just think it's neat. Consider the mere phrase "semiotic immune system" (from, if I recall correctly, Charles Stross's Accelerando) — suggests a lot at once, eh?
I asked Claude to prepare the following tutorial - which I have not yet read (longa est vita, si uti bene scias...) - developing this theme: https://claude.ai/share/67bb8de3-b73c-4a21-916b-70affba0da43
*Written with slight corrections for conciseness from Opus4.6. Ironically, the em-dashes are mine.
Something which I think highly relevant, and which might inform your GAN discussion, is the difference in performance of W-GAN - basically if you train the GAN using an optimal transport metric instead of an information theoretic one, it seems to have much better robustness properties, and this is probably because shannon entropy doesn't respect continuity of your underlying metric space (e.g. KL divergence between Delta(x0) and Delta(x0 + epsilon) is infinity for any nonzero epsilon, so it doesnt capture 'closeness'). I don't yet know how I think this should tie into the high-probability latent manifold story you tell, but it seems like part of it.
I agree Thom's work is interesting and relevant here; I've seen it bruited about a lot, but likewise haven't gotten around to seriously studying it. From my perspective, structural stability is extremely important, but I am curious if there is any reason to identify this kind of stability with catastrophes? I've always looked at the classification and figured its just one example of this kind of structural stability, and not one I've really even seen much of in physics, compared to the I-think-close-but-not-exactly-the-same more general idea of critical phenomena and phase transitions. I'd be very interested if there was some sort of RG-based stability analysis.
The dynamical systems way of framing this I like the most is Koopman analysis; Steve Brunton has a great family of talks on on youtube: https://www.youtube.com/watch?v=J7s0XNT96ag
Tldr is that the Koopman operator evolves functions of the dynamical variables; it's an infinite dimensional linear operator, thus it has eigenfunctions and eigenvalues. Conserved functions of state have zero-eigenvalue; nearly conserved quantities have 'small' eigenvalues, corresponding to a slower time rate of decay. Then you can characterise all functions in terms of the decay rate of their self-correlation. So the dominant koopman eigenvalue of a (not necessarily eigenfunction) is a scalar value corresponding to how "not conserved" it is.
You can also look at the Jordan blocks, which correspond to groups of functions which are predictively closed (you can predict their future values by knowing only the present values of those functions). These are some threads I've found interesting in thinking about natural ontology but I do not think they are sufficient - for example they give a very clear picture of intra-realization correlations, but do not have a good way of talking about inter-realization correlations or sensitivity to initial conditions.
strictly weaker
add "than Condorcet" in this sentence since its only implied but not said
I think we agree modulo terminology, with respect to your remarks up to the part about the Krakovna paper, which I had to sit and think a little bit more about.
For the Krakovna paper, you're right that it has a different flavor than I remembered - it still seems, though, that the proof relies on having some ratio of recurrent vs. non-recurrent states. So if you did something like 1000x the number of terminal states, the reward function is 1000x less retargetable to recurrent-states - I think this is still true even if the new terminal states are entirely unreachable as well?
With respect to the CNN example I agree, at least at a high-level - though technically the theta reward vectors are supposed to be |S| and specify the rewards for each state, which is slightly different than being the weights of a CNN - without redoing the math, its plausible that an analogous theorem would hold. Regardless, the non-shutdown result gives retargetability because it assumes there's a single terminal state and many recurrent states. The retargetability is really just the ratio (number of terminal states) / (number of recurrent states), which needn't be greater than one.
Anyways, as the comments from Turntrout talk about, as soon as there's a nontrivial inductive bias over these different reward-functions (or any other path-dependence-y stuff that deviates from optimality), the theorem doesn't go through, as retargetability is all based on counting how many of the functions in that set are A-preferring vs. B-preferring - there may be an adaptation to the argument that uses some prior over generalizations and stuff, though - but then that prior is the inductive bias, which as you noted with those TurnTrout remarks, is its own whole big problem :')
I'll try and add a concise caveat to your doc, thanks for the discussion :)
Ahah I have Claude's system prompt set to default to Chinese so I can practice. Since my speaking/writing abilities suck much worse than my reading, I also told it to nag me to write in Chinese for the sake of practice lol. This works... variably well.