I think this only holds if fine tunes are composable [...] you probably can't take a million independently-fine-tuned models and merge them [...]
The purpose of a fine-tune is to "internalize" some knowledge - either because it is important to have implicit knowledge of it, or because you want to develop a skill.
Although you may have a million instances executing tasks, the knowledge you want to internalize is likely much more sparse. For example, if an instance is tasked with exploring a portion of a search space, and it doesn't find a solution in that portion, it can just summarize its finding in a few words. There might not even be a reason to internalize this summary - it might be merged with other summaries for a more global view of the search landscape.
So I don't see the need for millions of fine-tunes. It seems more likely that you'd have periodic fine-tunes to internalize recent progress - maybe once an hour.
The main point is that the single periodic fine-tune can be copied to all instances. This ability to copy the fine-tune is the main advantage of instances being identical clones.
On the other hand, the world already contains over 8 billion human intelligences. So I think you are assuming that a few million AGIs, possibly running at several times human speed (and able to work 24/7, exchange information electronically, etc.), will be able to significantly "outcompete" (in some fashion) 8 billion humans? This seems worth further exploration / justification.
Good point, but a couple of thoughts:
Thank you, I missed it while looking for prior art.
If we haven't seen such an extinction in the archaeological record, it can mean one of several things:
We don't know which. I think it's a combination of 2 and 3.
The app is not currently working - it complains about the token.
and thus AGI arrives - quite predictably[17] - around the end of Moore's Law
Given that the brain only consumes 20 W because of biological competitiveness constraints, and that 200 KW only costs around $20/hour in data centers, we can afford to be four OOMs less efficient than the brain while maintaining parity of capabilities. This results in AGI's potential arrival at least a couple of decades earlier than the end of Moore's Law.
The main advantage is that you can immediately distribute fine-tunes to all of the copies. This is much higher bandwidth compared to our own low-bandwidth/high-effort knowledge dissemination methods.
The monolithic aspect may potentially be a disadvantage, but there are a couple of mitigations: