Have we already lost? Part 2: Reasons for Doom

LawrenceC

Written very quickly for the Inkhaven Residency.

As I take the time to reflect on the state of AI Safety in early 2026, one question feels unavoidable: have we, as the AI Safety community, already lost? That is, have we passed the point of no return, after which AI doom becomes both likely and effectively outside of our control?

Spoilers: as you might guess from Betteridge’s Law, my answer to the headline question is no. But the salience of this question feels quite noteworthy to me nonetheless, and reflects a more negative outlook on the future.

Yesterday I laid out “the plan” as I understood it in 2024.

Today, I’ll explain the reasons I’ve become more pessimistic on the 2024 plan. (And tomorrow, I’ll talk about why I think the answer is still no.)

Reasons for more doom

(Unilateral) voluntary commitments from companies seem unlikely to hold

In our original RSP blog post, we outlined a vision for RSPs as companies “committing to gate scaling on concrete evaluations and empirical observations”, where “we should expect to halt AI development in cases where we do see dangerous capabilities, and continue it in cases where worries about dangerous capabilities were overblown.” I think empirically, this vision of RSPs seems unlikely to work.

First, many of the frontier safety policies that followed Anthropics were substantially less strict or well specified. DeepMind and OpenAI’s policies were probably the next best, but they were substantially less strict than Anthropics in terms of acceptable conditions to continue deployment. (Also, the OpenAI Preparedness team seems to have suffered a fair number of the usual OpenAI Safety departures.) Many company RSPs make no reference to actually terminating deployment, and XAI’s policy has plausibly already been violated as of early 2026.

Even Anthropic, which pioneered RSPs in collaboration with METR back in 2024, has updated their RSP to be substantially less strict than was outlined.

AI progress seems to be consistent with faster timelines

In 2024, there was a sense that AI progress was fast, but there wasn’t a good sense of exactly how fast it was. I think in 2026, it seems fairly clear that AI progress is fast enough that we can’t rule out ~full coding automation by 2028, let alone 2030 or 2035.

For one thing, the METR graph, originally released in March of 2025, found a consistent exponential trend in the length of tasks that AIs could complete, which has held up (or even accelerated) in the following year:

We’ve also just seen the massive use of AI coding assistants as well as massive scale ups in investment from leading labs, just to name two other signs of fast progress.

Insofar as our plans required more time, the faster timeline is definitely bad news.

Ambitious technical research has (largely) not paid out

In 2024, (ambitious) mechanistic interpretability was contested but still a major area of investment. There were also a bunch of other ambitious AI safety research agendas that aimed to rigorously tackle substantial parts of the AI alignment problem, such as ARC’s ELK/Heuristic Arguments Agenda, Singular Learning Theory, or non-agentic AI scientists. While work continues to be done on each of these agendas, none of these agendas has really paid off in a massive way.

The community has largely concentrated its investment into Anthropic

In 2024, Anthropic was probably the largest consumer of skill-weighted AI Safety talent of any organization in the world. However, it was still only a plurality, and not a majority.

I think in 2026, Anthropic is clearly a majority consumer of skill-weighted AI Safety talent. This isn’t clearly a bad thing in itself: insofar as the “make Anthropic win” plan is the main way to have the margin to invest in less-competitive but safer approaches, this is pretty much what would have to happen. But insofar as Anthropic as an organization has problems, for example by being wrong about key questions in AI safety, or driven to motivated thinking for profit or status reasons, or not able to efficiently use marginal talent, then this is a clearly bad thing from the perspective of the community.

Regardless of whether or not it was net good, I think it’s still sad that due to Anthropic’s commercial success (and several outflows of talent from OpenAI) the AI Safety community does not really have an independent existence outside of Anthropic.

The current US administration has many bad qualities from an AI Safety standpoint, and explicitly opposes "AI Safety"

On November 5th, after I saw the early results come in, I thought to myself that we were now playing on hard mode in terms of AI governance. I think my assessment has held up pretty well given what has happened since.

Many of the developments that I considered positive signs for domestic AI governance were explicitly opposed and or revoked by the current US Administration. The current US administration has also taken many actions that have made international cooperation on AI safety much less likely, including by antagonizing many of America’s existing allies. Also, key members of the administration have explicitly opposed AI Safety (and some have even espoused deranged conspiracy theories about how all of AI Safety is an attempt at regulatory capture by Anthropic).

The administration has just been exceedingly stupid, corrupt, and chaotic in general. (A recent example of this is designating Anthropic as a supply chain risk as a (failed) negotiating tactic.)

the AI Safety community does not really have an independent existence outside of Anthropic.

This feels like an exaggeration. There are loads of AI safety orgs that aren't Anthropic. If what you are saying is that too much of the AI safety ecosystem is too friendly to Anthropic, just say that instead.

sometimes i wonder if the extent to which you're a live player is your proximity to a frontier company. Feels that way sometimes! I think there's a third thing he could mean, which is that the orgs that aren't anthropic or close partners aren't live players ^[1] .

(note: even if interacting with frontier companies is a reliable way to increase your magnitude, that doesn't mean you'd be able to reliably predict the sign).

seems false in governance, awareness/public opinion/forecasting, policy. Pretty plausible for technical work though ↩︎

For technical work on the LLM paradigm, it's a huge problem that most of the interesting (or concerning) behaviors don't exist in small models. It's probably possible to make progress on frontier model problems without access to one, but it's definitely hard-mode.

Add to that that the big labs don't publish everything (or publish with delays), and now you have to worry that even if you do find something potentially relevant, there's a good chance it's not novel.

The administration has just been exceedingly stupid, corrupt, and chaotic in general. (A recent example of this is designating Anthropic as a supply chain risk as a (failed) negotiating tactic.)

I do not consider this at all productive. I have my disagreements with the administration, particularly recently, but the extent that any productive efforts are still possible, aggressive name-calling at half of the population is extraordinarily damaging to them. You cannot solve a very difficult national-scale coordination problem if half of voters believe you hate them unconditionally.

Failing to capitalize on this, for instance, was a catastrophic own goal. Imagine a world in which Anthropic (or whichever leading AI safety organization you trust most) had taken public issue with the results shown, and taken public steps to alleviate racial bias from frontier LLMs. That organization would've immediately gained massive credibility with AI-interested figures on the Right, to the point where they would be able to significantly counterbalance industry lobbying efforts via the same mechanism that immigration opponents counterbalance lobbying efforts by e.g. the Chamber of Commerce.

If it had been Anthropic to raise issue, for instance, than it's quite plausible that the right wing reaction to the dustup between them and the DoW would've been to chide the DoW and demand de-escalation rather than to assume that an apparently-hostile organization was probably facing deserved consequences for engaging in hostile behavior, consider the issue resolved, and move on to other things.

aggressive name-calling at half of the population

Without prejudice to your larger point, the OP is literally not doing this.

The administration has just been exceedingly stupid, corrupt, and chaotic in general.

Come on, now. You have your tribal allegiances, but calling a group of people "exceedingly stupid" is name-calling. Let's not let LessWrong go through the same transition of Reddit did, where blatant violations of the norms are permitted as long as they're sufficiently on-side.

I didn't mean it's not name-calling, I meant "the administration" is not "half the population".

the AI Safety community does not really have an independent existence outside of Anthropic.

(note: even if interacting with frontier companies is a reliable way to increase your magnitude, that doesn't mean you'd be able to reliably predict the sign).

seems false in governance, awareness/public opinion/forecasting, policy. Pretty plausible for technical work though ↩︎

The administration has just been exceedingly stupid, corrupt, and chaotic in general. (A recent example of this is designating Anthropic as a supply chain risk as a (failed) negotiating tactic.)

aggressive name-calling at half of the population

Without prejudice to your larger point, the OP is literally not doing this.

The administration has just been exceedingly stupid, corrupt, and chaotic in general.

I didn't mean it's not name-calling, I meant "the administration" is not "half the population".

45

Have we already lost? Part 2: Reasons for Doom

45

Reasons for more doom

(Unilateral) voluntary commitments from companies seem unlikely to hold

AI progress seems to be consistent with faster timelines

Ambitious technical research has (largely) not paid out

The community has largely concentrated its investment into Anthropic

The current US administration has many bad qualities from an AI Safety standpoint, and explicitly opposes "AI Safety"

45

45