In Anthropic's Alignment Risk Report update for Mythos, they claim:
I don't think Anthropic (or anyone) has an achievable path for keeping risk low if AI proceeds as fast as Anthropic expects (or as fast as I expect). Anthropic could (and hopefully will) take actions that significantly reduce the risk, but this won't keep risk low.
More precisely: I do not think Anthropic has an achievable (>10% likely) path that results in keeping the aggregate existential risk they impose to <2%, as assessed in advance by a reasonable evaluator (the notion of risk that the risk report is supposed to estimate). Anthropic might avoid causing existential catastrophe (in fact, I happen to think this is likely), but they will impose quite a bit of risk along the way (supposing they succeed at their stated intentions of being a leading company building powerful AI within the next 5 years).
My understanding is that Anthropic employees (especially Anthropic employees writing this report) often don't believe there is an achievable path to keeping risk low if Anthropic builds powerful AI / ASI in the next 5 years, so text seems incorrect or misleading. E.g., see Holden's most recent 80k episode or his post when RSP v3 came out.
I wish Anthropic communicated more accurately about future risk in their risk report (the risk report is supposed to especially avoid spin). Or, if they're unwilling to do this, not commenting at all would be better.
(Cross post from X/Twitter.)