Though to be clear, it seems very very difficult to me, like it might be at a vaguely comparable level of difficulty as "solving biology". Which is part of why I'm not working on that directly, but instead aiming at technological human intelligence amplification.
Like trying to transcribe the egregore's algorithm into something easily human-readable?
Yes, that's basically what I mean. There's a lot of coded movements. Examples of classes of examples:
So you'd first of all want to decode this stuff so that
Further, there's presumably healthy, humanity-aligned ways of participating in egregores (I mean the name is a bit scary, but like, some companies, governments, religious strains, traditions, norms, grand plans, etc., are good to participate in), or in other words effective, epistemic, Good shared-intentionality-weaving. This is an entire huge and fundamental missing sector of our philosophy. We might have to understand this better to make progress on hard things. Decoding obvious egregores would be a way in. As an example, I suspect there is some sequence of words, humanly producible, maybe with prerequisites (such as having a community backing you up, or similar), that would persuade most AGI researchers to just stop--but you might need more theory to produce those words.
I think I don't understand this argument. In creating AI we can draw on training data, which breaks the analogy to making a replicator actually from scratch (are you using a premise that this is a dead end, or something, because "Nearly all [thinkers] do not write much about the innards of their thinking processes..."?).
You're technically right that the analogy is broken in that way, yeah. Likewise, if someone gleans substantial chunks of the needed Architecture by looking at scans of brains. But yes, as you say, I think the actual data (in both cases) doesn't directly tell you what you need to know, by any stretch. (To riff on an analogy from Kabir Kumar: it's sort of like trying to infer the inner workings of a metal casting machine, purely by observing price fluctuations for various commodities. It's probably possible in theory, but staring at the price fluctuations--which are a highly mediated / garbled / fuzzed emanation from the "guts" of various manufacturing processes--is not a good way to discover the important ideas about how casting machines can work. Cf. https://www.lesswrong.com/posts/unCG3rhyMJpGJpoLd/koan-divining-alien-datastructures-from-ram-activations )
We've seen that supervised learning and RL (and evolution) can create structural richness (if I have the right idea of what you mean) out of proportion to the understanding that went into them.
Not sure I buy the claims about SL and RL. In the case of SL, it's only going "a little ways away from the data", in terms of the structure you get. Or so I claim uncertainly. (Hm... maybe the metaphor of "distance from the data" is quite bad.... really I mean "it's only exploring a pretty impoverished sector in structurespace, partly due to data and partly due to other Architecture".) In the case of RL, what are the successes in terms of gaining new learned structure? There's going to be some--we can point to AlphaZero, and maybe some robotics things--but I'm skeptical that this actually represents all that much structural richness. The actual NNs in AlphaZero would have some nontrivial structure, but hard to tell how much, and it's going to be pretty narrow / circumscribed, e.g. it wouldn't represent most interesting math concepts.
Anyway, the claim is of course true of evolution. The general point is true, that learning systems can be powerful, and specifically high-leverage in various ways (e.g. lots of learning from small algorithmic complexity fingerprint as with evolution or Solomonoff induction, or from fairly small compute as in humans).
Of course this doesn't mean any particular learning process is able to create a strong mind, but, idk, I don't see a way to put a strong lower bound on how much more powerful a learning process is necessary,
Right, no one knows. Could be next month that everyone dies from AGI. The only claims I'd really argue strongly would be claims like
Besides my comments about the bitter lesson and about the richness of evolution's search, I'll also say that it just seems to me like there's lots of ideas--at the abstract / fundamental / meta level of learning and thinking--that have yet to be put into practice in AI. I wrote in the OP:
The self-play that evolution uses (and the self-play that human children use) is much richer, containing more structural ideas, than the idea of having an agent play a game against a copy of itself.
IME if you think about these sorts of things--that is, if you think about how the 2.5 known great and powerful optimization processes (evolution, humans, humanity/science) do their impressive thing that they do--if you think about that, you see lots of sorts of feedback arrangements and ways of exploring the space of structures / algorithms, many of which are different in some fundamental character from what's been tried so far in AI. And, these things don't add up, in my head, to a general intelligence--though of course that is only a deficiency in my imagination, one way or another.
(EDIT: Maybe (you'd say) I should be drawing such a strong lower bound from the point about sample efficiency...?)
I don't personally lean super heavily on the sample efficiency thing. I mean, if we see a system that's truly only trained on some human data that's of size less than 10x the amount that a well-read adult human has read (plus compute / thinking), and it performs like GPT-4 or similar, that would be really weird and surprising, and I would be confused, and I'd be somewhat more scared. But I don't think it would necessarily imply that you're about to get AGI.
Conversely, I definitely don't think that high sample complexity strongly implies that you're not about to get AGI. (Well, I guess if you're about to get AGI, there should probably be spikes in sample efficiency in specific areas--e.g. you'd be able to invent much more interesting math with little or no data, whereas previously you had to train on vast math corpora. But we don't necessarily have to observe these domain spikes before dying of nanopox.)
Yeah, in particular it seems like I'm updating more than you from induction on the conceptual-progress-to-capabilities ratio we've seen so far / on what seem like surprises to the 'we need lots of ideas' view. (Or maybe you disagree about observations there, or disagree with that frame.) (The "missing update" should weaken this induction, but doesn't invalidate it IMO.)
Yeah... To add a bit of color, I'd say I'm pretty wary of mushing. Like, we mush together all "capabilities" and then update on how much "capabilities" our current learning programs have. I don't feel like that sort of reasoning ought to work very well. But I haven't yet articulated how mushing is anything more specific than categorization, if it is more specific. Maybe what I mean by mushing is "sticking to a category and hanging lots of further cognition (inferences, arguments, plans) on the category, without putting in suitable efforts to refine the category into subcategories". I wrote:
We should have been trying hard to retrospectively construct new explanations that would have predicted the observations. Instead we went with the best PREEXISTING explanation that we already had.
Ok gotcha, thanks. In that case it doesn't seem super relevant to me. I would expect there to be lots of gains in any areas where there's algebraicness to chew through; and I don't think this indicates much about whether we're getting AGI. Being able to "unlock" domains, so that you can now chew through algebraicness there, does weakly indicate something, but it's a very fuzzy signal IMO.
(For contrast, a behavior such as originarily producing math concepts has a large non-algebraic component, and would IMO be a fairly strong indicator of general intelligence.)
Um, ok, were any of the examples impressive? For example, did any of the examples derive their improvement by some way other than chewing through bits of algebraicness? (The answer could easily be yes without being impressive, for example by applying some obvious known idea to some problem that simply hadn't happened to have that idea applied to it before, but that's a good search criterion.)
Please provide more detail about this example. What did the system invent? How did the system work? What makes you think it's novel? Would it have worked without the LLM?
(All of the previous many times someone said something of the form "actually XYZ was evidence of generality / creativity / deep learning being awesome / etc.", and I've spent time looking into the details, it turns out that they were giving a quite poor summary of the result, in favor of making the thing sound more scary / impressive. Or maybe using a much lower bar for lots of descriptor words. But anyway, please be specific.)
@Lucius Bushnaq It's not too combative, you're wrong. My previous comment laid out what's wrong with the reasoning. Then Noosphere89 wrote a big long comment that makes all the same lines of reasoning, still without giving any arguments. This is really bad epistemics, and people going around vibing hard about this have been poisoning (or rather, hijacking https://www.lesswrong.com/posts/dAz45ggdbeudKAXiF/a-regime-change-power-vacuum-conjecture-about-group-belief) the discourse for 5 years.
Alas, more totally unjustified "we just need X". See https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense?commentId=NpT59esc92Zupu7Yq
Rubber duck is what I want to try.
I'm not a neuroscientist either, but if you're not at all familiar with the field, then yes of course there's stuff you're missing.
As I wrote:
Current systems are are still low numbers of active connections. Maybe this can be scaled up, but seems quite hard to scale it up by several orders of magnitude.
More zoomed out, I think the only method that would definitely work is germline engineering (well, and WBE, but that has its own problems). Everything else is speculation--what should make us think you can increase someone's deep problem solving ability that way?