Gradient descent on Atari games
I feel a bit confused about gradient descent being described as a selective process, and thus about this binary. Is gradient descent a selective process? It doesn't seem like it.
All the other examples of selective processes involve... variation and selection: you have a population with variation, the population gets culled, the remaining population has more of some quality, repeat. But gradient descent does not feature this, at least not in a straightforward way. There's no pool of candidates, no acceptance / rejection, no competition, really.
(This might have consequences, for instance, with how gradient descent can work differently from more selective / evolutionary processes. Evolutionary Strategies At Scale for instance, finds that "Evolutionary Strategies" has a different behavior when used to train an LLM than gradient descent. See also.)
But generally this binary feels pretty fuzzy to me; the MECE-ness of it, or membership criteria seems unclear.
Looking over my favourite posts, I notice that many of them are making specific versions of a more general claim, which is essentially: don’t confuse selective processes for predictive processes.
Here, I’m going to try to make that more general claim, rehash some examples in light of it, and end with a few ambient confusions I think this framework can help with, for the reader to ponder.
When you encounter an entity that is very good at achieving some outcome, there are two very different processes that could be going on under the hood:
It’s not a perfect binary, and often what you see is a mix of the two. In particular, all predictive optimisers have emerged from selective optimisation and often retain some fingerprint.
Selective
Predictive
Weird Mix
Bacteria developing antibiotic resistance
Hacker finding a way to penetrate a secure system
Humans evolving to be good at lying
Gradient descent on Atari games
Tree searching Connect Four
AlphaZero training a policy on its own rollouts
Flowers co-evolving with their pollinators
Humans genetically modifying crops
Humans selectively breeding dogs
Human brains seem to be hardwired to reason about intent, in the same way that we see faces everywhere. The problem is, selective processes behave a bit differently. For example:
So when you try to interpret a system as purely predictive when it’s at least partly selective, you might mistakenly assume it generalises a lot more cleanly than it actually does, that its behaviour is in some meaningful sense intended, or that it can’t be that optimised because you can’t see much computation lying around. These can sometimes be dangerous mistakes.
(The last one in particular gives you a slightly more precise variant of Chesterton's Fence: before scrapping a tradition where nobody can articulate why it's useful, at least ballpark how much optimisation has probably gone into it.)
That’s the whole point. The rest of this post will just be spelling out examples, but feel free to stop if you’ve already got the gist.
This picture is basically right but not quite — can you spot why?
Classic Examples and Confusions
Now that I've given the general claim and a few instantiations, I’m going to close with some other cases where I think this distinction is relevant, for the reader to ponder.
Either by making predictions itself or by being crafted by a predicting entity
The author of this post claims that this is specifically because people find these explanations less intuitive.
One of the most common counterarguments is roughly “nobody wants this so it won’t happen”. I think it has some bite, but not much, and I am still not sure how to bridge that inferential gap — that’s part of what motivated me to write this post up.