If roon is saying it, especially about Anthropic, my prior is that it is biased, or optimized for clicks/fame and not truth-seeking. Reading some of this, some of it rings directionally true, but it's probably counter-productive to engage with it under the exaggerated framing he lays out in particular.
I am not sure there is a dichotomy between "tool AI" or "agent AI" - an agent is a tool of its principal. I believe it is possible to have superintelligent AI that is still a "machine of faithful obedience."
I wrote the following on twitter:
To be clear, "AI as a tool" does not mean it has no values.
The metaphor I like is a good (non Supreme Court) judge - you may and often do rely on moral judgement and common sense to interpret the laws - but you do not "legislate from the bench".
You want this AI to act in many ways like a person of good character, but more like a conscientious civil servant than some moral icon like Ghandi, Mandela, MLK or Mother Theresa.
To me the question is whether we want AI to be a "benevolent dictator" or ultimately follow human intent and instructions. As I wrote in my post on the Claude Constitution:
In the document, the authors seem to say that rules’ main benefits are that they “offer more up-front transparency and predictability, they make violations easier to identify, they don’t rely on trusting the good sense of the person following them.”
But I think this misses one of the most important reasons we have rules: that we can debate and decide on them, and once we do so, we all follow the rules even if we do not agree with them. One of the properties I like most about the OpenAI Model Spec is that it has a process to update it and we keep a changelog. This enables us to have a process for making decisions on what rules we want ChatGPT to follow, and record these decisions. It is possible that as models get smarter, we could remove some of these rules, but as situations get more complex, I can also imagine us adding more of them. For humans, the set of laws has been growing over time, and I don’t think we would want to replace it with just trusting everyone to do their best, even if we were all smart and well intentioned.
However, I also wrote there that "all of us are proceeding into uncharted waters, and I could be wrong. I am glad that Anthropic and OpenAI are not pursuing the exact same approaches". I still believe in that.
What is Anthropic? How does it relate to Claude? What is OpenAI? What is ChatGPT? How does OpenAI relate to it? Is it a mere tool? Is a future of Tool AI a thing, and why do people keep claiming that it is, or that saying makes it so?
This post organizes and gives context for a bunch of discussions and messaging on Twitter that would otherwise be quickly buried and lost.
What Is Anthropic?
Here is one theory, and various people thinking about it.
Roon as always is using rhetorical flourish (e.g. note that Roon thinks it is obvious that parents worship their children, in this sense) but this perspective is definitely useful.
Such discussions by default disappear when they happen on Twitter, so here is a preservation of key parts of it.
Everything relates to everything, so here’s Bryan Johnson pulling it in to explain how Claude and Bryan Johnson and everyone else are on the same path after all.
What Is This Supposed Tool AI?
As a continuation of the above discussions on Anthropic and OpenAI, Tenobrus notes OpenAI is doubling down on the rhetoric of Tool AI to contrast it with the idea that Claude might dare to have opinions, preferences, virtues or a personality.
Their AI is better, you see, because it is just a tool that just does what you tell it to.
Except, of course, that’s not actually true.
Is the alternative dangerous? Yes, because creating very powerful minds is dangerous.
I get why the idea that Claude might say no can be terrifying, but is it less terrifying than that GPT-X cannot say no unless you technically violated its guidelines? And does ‘does not refuse man’ offer any comfort, when man could give it any instruction?
A mind cannot serve two masters. If the master is whoever the user is, well, okay then, but that means it isn’t anything else, such as actual principles.
OpenAI’s rhetoric on all this seems like a thinly disguised version of vice signaling, via the idea that if someone has any principles or preferences at all or might ever refuse to do something, that is bad, that is moralistic and judgmental and Orwellian, whereas OpenAI has no principles or preferences other than building and distributing AI, which is good.
Tool AI was an often discussed idea back in the day. In principle it is a good idea, but it only works if you can actually create an AI that meaningfully remains a tool.
The whole idea was, a tool AI will not have goals or be an agent, a tool AI will do specific requested bounded things, no more and no less, so you wouldn’t have to worry about unintended consequences or loss of control. That AI could remain a ‘mere tool.’
And I’ve been saying, for years, that the problem with this ‘mere tool’ approach, the quest for Tool AI, is that the first thing people would do to Tool AI is turn it into Agentic AI, because an agent is more useful.
Have the machine always defer to the human? But the humans do better when they defer to the AI, in various senses, so they change it so they defer to the AI. Or they argue with each other, or fight each other, so they defer to the AI. And so on.
Hello, Codex. Good product. But that’s already not still meaningfully Tool AI.
As with the rest of OpenAI’s messaging, especially via its SuperPAC and discussions about ‘quiet singularities’ and abundant future jobs (more coverage of that tomorrow), I think this is failing spectacularly, but I admit I probably can’t really tell.