I personally find this pretty heartening. Anthropic is saying "someone should probably stop us if they can stop everyone" and I think that's a pretty defensible stance. It's not necessarily correct, but arguments about the difficulty of alignment vs. the badness of bad actors with AGI are much looser than we'd like.
As I mentioned last week I'd love it if they'd acknowledge the potential for concentrated influence/power risk in principle, even if they think they won't or can't do that.
How the alignment problem gets solved—or not—in this future is something we are least certain about. ... But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe.
Another crazy text in these crazy times. "We don't know how to solve the alignment problem, but we're going to race ahead anyway, because otherwise less cautious actors will win." Which less cautious actors, Anthropic?
And also seconding Oliver's question. What about power concentration, Anthropic? Your CEO has said literally this: "Anthropic has much more in common with the Department of War than we have differences." Alignment to whom?
Aren't xAI who fails to care about alignment and China who trains models to censor themselves perfect examples of such bad actors?
Additionally, Anthropic tried including the anti-power-concentration clause into Claude's Constitution: "We’re especially concerned about the use of AI to help individual humans or small groups gain unprecedented and illegitimate forms of concentrated power. In order to avoid this, Claude should generally try to preserve functioning societal structures, democratic institutions, and human oversight mechanisms, and to avoid taking actions that would concentrate power inappropriately or undermine checks and balances."
As for the impact on jobs, what about Anthropic's CEO who explicitly said: "I do think in the long run AI will become so broadly effective and so cheap that this will no longer apply. At that point our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized"?
Could you suggest what else Anthropic should've done to acknowledge those issues?
Aren’t xAI who fails to care about alignment and China who trains models to censor themselves perfect examples of such bad actors?
Well, racing to recursive self-improvement without solving alignment kills everyone. A company can't justify that by pointing to other "bad actors", at that point they're a bad actor themselves.
And even before killing everyone, there's other stuff that needs mentioning. Anthropic, like the other US labs afaik, has agreed that its models can be used by the US government for blanket surveillance of non-Americans. The US has a history of supporting nasty regimes abroad (see Operation Condor in South America) and sending them data to help political repression (see the Indonesia massacre).
They don't think it kills everyone with high likelihood. The problem is that no one knows for sure how hard alignment is. There are no really convincing arguments on either side. It's a wild guess and people are free to pick whatever their Motivated reasoning prefers.
This is a linkpost to https://www.anthropic.com/institute/recursive-self-improvement