while this paradigm of 'training a model that's an agi, and then running it at inference' is one way we get to transformative agi, i find myself thinking that probably WON'T be the first transformative AI, because my guess is that there are lots of tricks using lots of compute at inference to get not quite transformative ai to transformative ai.
Agreed that this is far from the only possibility, and we have some discussion of increasing inference time to make the final push up to generality in the bit beginning "If general intelligence is achievable by p...
Our pleasure!
I'm not convinced a first generation AGI would be "super expert level in most subjects". I think it's more likely they'd be extremely capable in some areas but below human level in others. (This does mean the 'drop-in worker' comparison isn't perfect, as presumably people would use them for the stuff they're really good at rather than any task.) See the section which begins "As of 2024, AI systems have demonstrated extremely uneven capabilities" for more discussion of this and some relevant links. I agree on the knowledge access and commun...
There are a lot of situations where that’s not an appropriate assumption, but rather the relevant question is “what’s the AGI population if most of the world’s compute is running AGIs”.
Agreed. It would be interesting to extend this to answer that question and in-between scenarios (like having access to a large chunk of the compute in China or the US + allies).
...FWIW Holden Karnofsky wrote a 2022 blog post “AI Could Defeat All Of Us Combined” that mentions the following: “once the first human-level AI system is created, whoever created it could use the same c
“The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed. … We could summarize this as a ‘country of geniuses in a datacenter’.”
Dario Amodei, CEO of Anthropic, Machines of Loving Grace
“Let’s say each copy of GPT-4 is producing 10 words per second. It turns out they would be able to run something like 300,000 copies of GPT-4 in parallel. And by the time they are training GPT-5 it will be a more extreme situation where just using the computer chips they used to train GPT-5, using them to kind of run copies of GPT-5 in parallel, you know, again,...