Each time a skeptic talks about AI as a technology, I think it signals a likely crux.

I've been watching the public AI debate closely and sadly. I think that debate might be crucial in whether we actually get aligned AGI, and it's not going particularly well so far. The debate is confused, and at risk of causing polarization by irritating all involved. Identifying cruxes should help the debate be less irritating.

Agency as a crux for x-risk

Of course AGI is a technology. But only in the way that humans are technically animals. Saying humans are animals has strong and wrong implications about how we behave and think. Calling AGI a technology has similar wrong implications.

Technologies do what they're designed to, with some potential accidents and side effects. Boilers explode, and the internet is used for arguments and misinformation instead of spreading information. These effects can be severe, and could even threaten the future of humanity. But they're not as dangerous as accidentally creating something that becomes smarter than you, and actively tries to kill you.

When someone refers to AI as a technology, I think they're often not thinking of it as having full agency.

While AI without full agency does present possible x-risks, I think it's a mistake to mix those in with the risks from fully agentic AGI. The risks from a fully agentic AGI are both easier to grasp, and more severe. I think it's wiser to address those first, and only move on with a careful distinction in topics.

By full agency, I mean something that pursues goals, and chooses its own subgoals (a relevant example subgoal is preventing humans from interfering with its projects). There's a spectrum of agency. A chess program has limited agency; it was made to play a good game of chess, and it can take moves to do that. Animals don't really make long-range plans that include subgoals, and no existing AI has long-range goals and makes new plans to achieve them. Humans are currently unique in that regard.

There's a huge difference between something deciding to kill you, and making a plan to do it, and something killing you by accident or misuse. Making this distinction should help deconfuse conversations on x-risk.

Agency seems inevitable

The above is only a good strategy if we're likely to see fully agentic AGI before too long. I find it implausible that we'll collectively stop short of creating fully agentic AI. I agree with Gwern's arguments for Why Tool AIs Want to Be Agent AIs. Agents are desirable because they can actually do things. And AI actively making itself smarter seems useful.

I think there is one more pressure for agentic AI that Gwern doesn't mention. One is the usefulness of creating explicit sub-goals for problem solving and planning. This is crucial for human problem-solving, and seems likely to be advantageous for many types of AI as well. Setting subgoals allows backward-chaining and problem factorization, among other advantages. I'll try to address this issue more carefully in a future post.

None of the above is meant to imply that agency is a binary category. I think agency is a branching spectrum. A saw, a chess program, an LLM, and a human have different amounts and types of agency. But I think it's likely we'll see AGI with all of the agency that humans have.

If this is correct, this is the issue we should focus on in x-risk discussions. If it's not, I'd be far less worried about AI risks. This potentially creates a point of agreement and an opportunity for productive discussion with x-risk skeptics who aren't thinking about fully agentic AI.

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 7:11 PM

I've been following the "tools want to become agents" argument since Holden Karnofsky raised the topic a long time ago, and I was almost convinced by the logic, but the LLMs show a very surprising lack of agency, and, as far as I can tell, this gap between apparent intelligence and apparent agency was never predicted or expected by the alignment theorists. I would trust their cautions more if they had a model that makes good predictions.

[-]gwern8mo175

but the LLMs show a very surprising lack of agency,

No, they don't. LLMs show an enormous amount of agency. They will generate text which includes things like plans even without a prompt for that. (As should come as no surprise, because LLMs are RL agents which have been trained offline on data solely from agents, ie. behavioral cloning.) They contain so much agency you can plop them down into robots without any training whatsoever and they provide useful control. They are tools that want to be agents so much that I think it took all of a few weeks for someone to hook up the OA API to a commandline in June 2020 and try to make it autonomous. LLMs like LaMDA started being granted access to the live Internet not long after that, simply because it's so obviously useful for tool AIs to become agent AIs. Startups like Adept, dedicated solely to turning these tools into agents, followed not long after that. I hardly need mention Sydney. And here we are in the present where OA has set up hundreds of plugins and an vast VM system to let their tools do agent-like things like autonomously write code, browse webpages, and integrate with other systems like make phone calls. LLMs show an enormous amount of agency, and it's only increasing over time under competitive and economic pressure, because more agency makes them more useful. Exactly as predicted.

Maybe we understand agency differently. You give LLMs tools to use, they will use it. But there is no discernable drive or "want" to change the world to their liking. Not saying that it won't show up some day, it's just conspicuously lagging the capabilities. 

[-]Viliam8mo2-1

In other words, LLMs are not actively trying to "get out of the box".

I guess that is one way to say it. But the statement is stronger than that, I think. They do not care about the box or about anything else. They react to stimuli, then go silent again.

So are you saying that you don't think we'll build agentic AI any time soonish? I'd love to hear your reasoning on that, because I'd rest easier if I felt the same way.

I agree that LLMs are marvelously non-agentic and intelligent. For the reasons I mentioned, I expect that to change, sooner or later, and probably sooner. Someone invented a marvelous new tool, and I haven't heard a particular reason to not expect this one to become an agent given even a little bit of time and human effort. The argument isn't that it happens instantly or automatically. AutoGPT and similar failing on the first quick public try doesn't seem like a good reason to expect similar language model agents to fail for a long time. I do think it's possible they won't work, but people will give it a more serious try than we've seen publicly so far. And if this approach doesn't hit AGI, the next one will experience similar pressures to be made into an agent.

As for models that make good predictions, that would be nice, but we do probably need to get predictions about agentic, self-aware and potentially self-improving agents right on the first few tries. It's always a judgment call on when the predictions are in the relevant domain. I think maintaining a broad window of uncertainty makes sense.

I do not know if we will or will not build something recognizable agentic any time soon. I am simply pointing out that currently there is a sizable gap that people did not predict back then. Given that we still have no good model what constitutes values or drives (definitely not a utility function, since LLMs have plenty of that), I am very much uncertain about the future, and I would hesitate to unequivocally state that "AGI isn't just a technology". So far it most definitely is "just a technology", despite the original expectations to the contrary by the alignment people.

Yes. But the whole point of the alignment effort is to look into the future, rather than have us run it over because we weren't certain what would happen and so didn't bother to make any plans for different things that would happen.

Yeah, I get that. But to look into the future one must take stock of the past and present and reevaluate models that gave wrong predictions. I am yet to see this happening.

The idea that agentiness is an advantage does not predict that there will never be an improvement made in other ways.

It predicts that we'll add agentiness to those improvements. We are busy doing that. It will prove advantageous to some degree we don't know yet, maybe huge, maybe so tiny it's essentially not used. But that's only in the very near term. The same arguments will keep on applying forever, if they're correct.

WRT your comment that we don't have a handle on values or drives, I think that's flat wrong. We have good models in humans and AI. My post Human preferences as RL critic values - implications for alignment lays out the human side and one model for AI. But providing goals in natural language for a language model agent is another easy route to adding a functional analogue of values.

I will continue for now to focus my alignment efforts on futures where AGI is agentic, because those seem like the dangerous ones, and I have yet to hear any plausible future in which we thoroughly stick to tool AI and don't agentize it at some point.

Edit: Thinking about this a little more, I do see one plausible future in which we don't agentize tool AI: one with a "pivotal act" that makes creating it impossible, probably involving powerful tool AI. In that future, the key bit is human motivations, which I think of as the societal alignment problem. That needs to be addressed to get alignment solutions implemented, so these two futures are addressed by the same work.

In a similar vein to this, I think that AI’s being called “tools” is likely to be harmful. It is a word which I believe downplays the risks, while also objectifying the AI’s. The objectification of something which may actually be conscious seems like an obvious step in a bad direction.

Yes, I think "tools" is an even more obvious red flag that this person isn't thinking about an agentic, self-aware system.