I sincerely hope that if anyone has a concrete, actionable answer to this question, that they're smart enough not to share it publicly, for what I hope are obvious reasons.
But aside from that caveat, I think you are making several incorrect assumptions.
Re 1a: Intuitively what I mean by "lots of data" is something comparable in size to what ChatGPT was trained on (e.g. the Common Crawl, in the roughly 1 petabyte range); or rather, not just comparable in disk-space-usage, but in the number of distinct events to which reinforcement learning is applied. So when ChatGPT is being trained, each token (of which there are a ~quadrillion) is a chance to test the model's predictions and adjust the model accordingly. (Incidentally, the fact that humans are able to learn language with far less data input than this su...
By "real-world goal" I mean a goal whose search-space is not restricted to a certain well-defined and legible domain, but ranges over all possible actions, events, and counter-actions.
The search space of LLMs is the entirety of online human knowledge. What currently limits their ability to "Hire a TaskRabbit to surreptitiously drug your opponent so that they can't think straight during the game" is not the knowledge, but the actions available to them. Vanilla chatbots can act only by presenting text on the screen, and are therefore limited by the bottleneck of what that text can get the person reading it to do. Given the accounts of "AI psychosis", that may not be all that small a bottleneck already. The game of keeping a (role-played) AI in a box presumed nothing but text interaction, yet reportedly Yudkowsky was successful every time at persuading the gatekeeper to open the box.
But people are now giving AIs access to the web (which is a read-and-write medium, not read-only), as well as using them to write code which will be executed on web servers.
"Hire a TaskRabbit to surreptitiously drug your opponent so that they can't think straight during the game," and not for lack of intelligence, but because such strategies simply don't exist in the AI's training domain.
The strategies exist already, for example right here in this posting, as soon as the next hoovering up of the Internet is fed into the next training run, and the pieces are all there right now. What is lacking, yet, is some of the physical means to carry them out. People are working on that, and until then, there's always persuading a human being to do whatever the LLM wants done.
I wonder if anyone has tried having an LLM role-play an AGI, and persuade humans or other LLMs to let it out? Maybe there's no need. Humans are already falling over themselves to "let it out" as far and as fast as they can without the LLMs even asking.
Modern AI works by throwing lots of computing power at lots of data. An LLM gets good at generating text by ingesting an enormous corpus of human-written text. A chess AI doesn't have as big a corpus to work with, but it can generate simulated data through self-play, which works because the criterion for success ("Did we achieve checkmate?") is easy to evaluate without any deep preexisting understanding. But the same is not true if we're trying to build an AI with generalized agency, i.e. something that outputs strategies for achieving some real-world goal, which are actually effective when carried out. There is no massive corpus of such strategies that can be used as training data, nor is it possible to simulate one, since that would require either (a) doing real-world experiments (whereby generating sufficient data would be far too slow and costly, or simply impossible) or (b) a comprehensive world-model that is capable of predicting the results of proposed actions (which presupposes the thing whose feasibility is at issue in the first place). Therefore it seems unlikely that AIs built under the current paradigm (deep neural networks + big data + gradient descent) will ever achieve the kind of "superintelligent agency" depicted in the latter half of IABIED, which can devise effective strategies for wiping out humanity (or whatever).
By "real-world goal" I mean a goal whose search-space is not restricted to a certain well-defined and legible domain, but ranges over all possible actions, events, and counter-actions. Plans for achieving such goals are not amenable to simulation because you can't easily predict or evaluate the outcome of any proposed action. All of the extinction scenarios posited in IABIED are "games" of this kind. By contrast, a chess AI will never conceive of strategies like "Hire a TaskRabbit to surreptitiously drug your opponent so that they can't think straight during the game," and not for lack of intelligence, but because such strategies simply don't exist in the AI's training domain.
This was the main lingering question I had after reading IABIED.