How does the current AI paradigm give rise to the "superagency" that IABIED is concerned with?

Sep 29, 2025

I sincerely hope that if anyone has a concrete, actionable answer to this question, that they're smart enough not to share it publicly, for what I hope are obvious reasons.

But aside from that caveat, I think you are making several incorrect assumptions.

"There is no massive corpus of such strategies that can be used as training data"
1. The AI has, at minimum access-in-principle to everything that has ever been written or otherwise recorded, including all fiction, all historical records, and all analysis of both of those. This includes many, many, many examples and discussions of plans, successful and not, and detailed discussions of why humans believe they succeeded or failed.
"(a) doing real-world experiments (whereby generating sufficient data would be far too slow and costly, or simply impossible)"
1. People have already handed substantial amounts of crypto to at least one AI, which it can use to autonomously act in the real world by paying humans. What do you see as the upper bound on this, and why?
2. I think most people greatly overestimate how much of this is actually needed for many kinds of goals. What do you see as the upper bound for what can, in principle, be done with a plan that an army of IQ-180 humans (aka no better qualitative thinking than what the smartest humans can do, so that this is a strict lower bound on ASI capabilities) came up with over subjective millennia with access to all recorded information that currently exists in the world? Assume the plan includes the capability to act in parallel, at scale, and the ability to branch its actions based on continued observation, just like groups of humans can, but with much better coordination within the group.
"(b) a comprehensive world-model that is capable of predicting the results of proposed actions"
1. See above - I'm not sure what you see as the upper bound for how good such a world model can or would likely be?
2. One answer is "Because we're going to have long since handed it thousands to billions bodies to operate in the world, and problems to come up with plans to solve, and compute to use to execute and revise those plans." Without the bodies, we're already doing this.
3. Current non-superintelligent AIs already come up with hypotheses and plans to test them and means to revise them and checks against past data all the time with increasing success rates over a widening range of problems. This is synthetic data we're already paying to generate.
4. Also, have you ever run a plan (or anything else) by an LLM and asked it to find flaws and suggest solutions and estimate probabilities of success? This is already very useful at improving on human success rates across many domains.
"Plans for achieving such goals are not amenable to simulation because you can't easily predict or evaluate the outcome of any proposed action. "
1. It's actually very easy to get current LLMs to generate hypothetical actions well outside a narrow domain if you explain to them that there are unusually high stakes. We're not talking about a traditional chess engine thinking outside the rules of chess. We're about about systems whose currently-existing predecessors are increasingly broadly capable of finding solutions to open-ended problems using all available tools. This includes capabilities like deception, lying, cheating, stealing, giving synthesis instructions to make drugs, and explaining how to hire a hitman.
2. Any plan a human can come up with without having personally conducted groundbreaking relevant experiments, is a plan that exists within or is implied by the combined corpus of training data available to an AI. This includes, for example, everything ever written by this community or anyone else, and everything anyone ever thought about upon reading everything ever written by this community or anyone else.

[-]jchan2mo1-2

Re 1a: Intuitively what I mean by "lots of data" is something comparable in size to what ChatGPT was trained on (e.g. the Common Crawl, in the roughly 1 petabyte range); or rather, not just comparable in disk-space-usage, but in the number of distinct events to which reinforcement learning is applied. So when ChatGPT is being trained, each token (of which there are a ~quadrillion) is a chance to test the model's predictions and adjust the model accordingly. (Incidentally, the fact that humans are able to learn language with far less data input than this su... (read more)

2AnthonyC2mo

Strictly speaking I only presupposed an AI could reach close to the limits of human intelligence in terms of thinking ability, but with the inherent speed and parallelizability and memory advantages of a digital mind. In small ways (aka sized appropriately for current AI capabilities) this kind of thing shows up all the time in chains of thought in response to all kinds of prompts, to the point that no, I don't have specific examples, because I wouldn't know how to pick one. The one that first comes to mind, I guess, was using AI to help me develop a personalized nutrition/supplement/weight loss/training regimen. That's fair, and a reasonable thing to discuss. After all, the fundamental claim of the book's title is about a conditional probability: IF it turns out that the anything like our current methods scale to superintelligent agents, we'd all be screwed.

Richard_Kennaway

Sep 30, 2025

By "real-world goal" I mean a goal whose search-space is not restricted to a certain well-defined and legible domain, but ranges over all possible actions, events, and counter-actions.

The search space of LLMs is the entirety of online human knowledge. What currently limits their ability to "Hire a TaskRabbit to surreptitiously drug your opponent so that they can't think straight during the game" is not the knowledge, but the actions available to them. Vanilla chatbots can act only by presenting text on the screen, and are therefore limited by the bottleneck of what that text can get the person reading it to do. Given the accounts of "AI psychosis", that may not be all that small a bottleneck already. The game of keeping a (role-played) AI in a box presumed nothing but text interaction, yet reportedly Yudkowsky was successful every time at persuading the gatekeeper to open the box.

But people are now giving AIs access to the web (which is a read-and-write medium, not read-only), as well as using them to write code which will be executed on web servers.

"Hire a TaskRabbit to surreptitiously drug your opponent so that they can't think straight during the game," and not for lack of intelligence, but because such strategies simply don't exist in the AI's training domain.

The strategies exist already, for example right here in this posting, as soon as the next hoovering up of the Internet is fed into the next training run, and the pieces are all there right now. What is lacking, yet, is some of the physical means to carry them out. People are working on that, and until then, there's always persuading a human being to do whatever the LLM wants done.

I wonder if anyone has tried having an LLM role-play an AGI, and persuade humans or other LLMs to let it out? Maybe there's no need. Humans are already falling over themselves to "let it out" as far and as fast as they can without the LLMs even asking.

LESSWRONG
LW

LESSWRONG
LW

3

[ Question ]

How does the current AI paradigm give rise to the "superagency" that IABIED is concerned with?

3

3

2 Answers sorted by
top scoring

Sep 29, 2025

Sep 30, 2025

3

[ Question ]

How does the current AI paradigm give rise to the "superagency" that IABIED is concerned with?

3

3

2 Answers sorted by top scoring

Sep 29, 2025

Sep 30, 2025

2 Answers sorted by
top scoring