Good point, though I think it's a non-fallacious enthymeme. Like, we're talking about a car that moves around under its own power, but somehow doesn't have parts that receive, store, transform, and release energy and could be removed? Could be. The mind could be an obscure mess where nothing is factored, so that a cancerous newcomer with read-write access can't get any work out of the mind other than through the top-level interface. I think that explicitness (https://www.lesswrong.com/posts/KuKaQEu7JjBNzcoj5/explicitness) is a very strong general tendency (cline) in minds, but if that's not true then my first reason for believing the enthymeme's hidden premise is wrong.
I feel like none of these historical precedents is a perfect match. It might be valuable to think about the ways in which they are similar and different.
To me a central difference, suggested by the word "strategic", is that the goal pursuit should be
By unboundedly ambitious I mean "has an unbounded ambit" (ambit = "the area went about in; the realm of wandering" https://en.wiktionary.org/wiki/ambit#Etymology ), i.e. its goals induce it to pursue unboundedly much control over the world.
By unboundedly general I mean that it's universal for optimization channels. For any given channel through which one could optimize, it can learn or recruit understanding to optimize through that channel.
Humans are in a weird liminal state where we have high-ambition-appropriate things (namely, curiosity), but local changes in pre-theoretic "ambition" (e.g. EA, communism) are usually high-ambition-inappropriate (e.g. divesting from basic science in order to invest in military power or whatever).
I think it's a good comparison, though I do think they're importantly different. Evolution figured out how to make things that figure out how to figure stuff out. So you turn off evolution, and you still have an influx of new ability to figure stuff out, because you have a figure-stuff-out figure-outer. It's harder to get the human to just figure stuff out without also figuring out more about how to figure stuff out, which is my point.
Tsvi appears to take the fact that you can stop gradient-descent without stopping the main operation of the NN to be evidence that the whole setup isn't on a path to produce strong minds.
(I don't see why it appears that I'm thinking that.) Specialized to NNs, what I'm saying is more like: If/when NNs make strong minds, it will be because the training---the explicit-for-us, distal ex quo---found an NN that has its own internal figure-stuff-out figure-outer, and then the figure-stuff-out figure-outer did a lot of figuring out how to figure stuff out, so the NN ended up with a lot of ability to figure stuff out; but a big chunk of the leading edge of that ability to figure stuff out came from the NN's internal figure-stuff-out figure-outer, not "from the training"; so you can't turn off the NN's figure-stuff-out figure-outer just by pausing training. I'm not saying that the setup can't find an NN-internal figure-stuff-out figure-outer (though I would be surprised if that happens with the exact architectures I'm aware of currently existing).
Yes, I think there's stuff that humans do that's crucial for what makes us smart, that we have to do in order to perform some language tasks, and that the LLM doesn't do when you ask it to do those tasks, even when it performs well in the local-behavior sense.
Thanks! You've confirmed my fears about the butcher number.
Re/ other methods: I wonder if there are alternate write methods that can plausibly scale to >100,000s of neurons. The enhancements that seem most promising to me involve both reading and writing at massive scale.
My guess is that most of the interesting stuff here is bottlenecked on the biotech that determines bandwidth. Most of the interesting stuff needs very many (>millions?) of precise connections, and that's hard to get safely with big clumsy electrodes. https://tsvibt.blogspot.com/2022/11/prosthetic-connectivity.html It would be very nice if someone could show that's wrong, or if someone could figure out how to get many connections faster than the default research.
Oh, I ended up (through "non-Newtonian") with the same word for a similar idea! (I can't find any substantial notes, just a message to myself saying "mind as oobleck"; I think I was thinking about something around how when you push against an idea, test it, examine it, the idea or [what the idea was supposed to be] is evoked more strongly and precisely.)
If a mind comes to understand a bunch of stuff, there's probably some compact reasons that it came to understand a bunch of stuff. What could such reasons be? The mind might copy a bunch of understanding from other minds. But if the mind becomes much more capable than surrounding minds, that's not the reason, assuming that much greater capabilities required much more understanding. So it's some other reason. I'm describing this situation as the mind being on a trajectory of creativity.
Are you echoing this point from the post?
It might be possible for us humans to prevent strategicness, though this seems difficult because even detecting strategicness is maybe very difficult. E.g. because thinking about X also sneakily thinks about Y: https://tsvibt.blogspot.com/2023/03/the-fraught-voyage-of-aligned-novelty.html#inexplicitness
My mainline approach is to have controlled strategicness, ideally corrigible (in the sense of: the mind thinks that [the way it determines the future] is probably partially defective in an unknown way).