To clarify, the primary complaint from my perspective is not that they published the report a month after external deployment per se, but that the timing of the report indicates that they did not perform thorough pre-deployment testing (and zero external testing).
And the focus on pre-deployment testing is not really due to any opinion about the relative benefits of pre- vs. post- deployment testing, but because they committed to doing pre-deployment testing, so it's important that they in fact do pre-deployment testing.
3. Other companies are doing far worse in this dimension. At worst Google is 3rd-best in publishing eval results. Meta and xAI are far worse.
Some reasons for focusing Google DeepMind in particular:
- Sharing information on capabilities is good but public deployment is a bad time for that, in part because most risk comes from internal deployment.
I'm not sure why this would make you not feel good about the critique or implicit ask of the letter. Sure, maybe internal deployment transparency would be better, but public deployment transparency is better than nothing.
And that's where the leverage is right now. Google made a commitment to transparency about external deployments, not internal deployments. And they should be held to that commitment or else we establish the precedent that AI safety commitments don't matter and can be ignored.
2. Google didn't necessarily even break a commitment? The commitment mentioned in the article is to "publicly report model or system capabilities." That doesn't say it has to be done at the time of public deployment.
This document linked on the open letter page gives a precise breakdown of exactly what the commitments were and how Google broke them (both in spirit and by the letter).[1] The summary is this:
- Google violated the spirit of commitment I by publishing its first safety report almost a
month after public availability and not mentioning external testing in their initial report.- Google explicitly violated commitment VIII by not stating whether governments
are involved in safety testing, even after being asked directly by reporters.
But in fact the letter actually understates the degree to which Google DeepMind violated the commitments. The real story from this article is that GDM confirmed to Time that they didn't provide any pre-deployment access to UK AISI:
However, Google says it only shared the model with the U.K. AI Security Institute after Gemini 2.5 Pro was released on March 25.
If UK AISI doesn't have pre-deployment access, a large portion of their whole raison d'être is nullified.
Google withholding access is quite strongly violating the spirit of commitment I of the Frontier AI Safety Commitments:
Assess the risks posed by their frontier models or systems across the AI lifecycle, including before deploying that model or system... They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments, and other bodies their governments deem appropriate.
And if they didn't give pre-deployment access to UK AISI, it's a fairly safe bet they didn't provide pre-deployment access to any other external evaluator.
The violation is also explained, although less clearly, in the Time article:
The update also stated the use of “third-party external testers,” but did not disclose which ones or whether the U.K. AI Security Institute had been among them—which the letter also cites as a violation of Google’s pledge.
After previously failing to address a media request for comment on whether it had shared Gemini 2.5 Pro with governments for safety testing...
Startups often pivot away from their initial idea when they realize that it won’t make money.
AI safety startups need to not only come up with an idea that makes money AND helps AI safety but also ensure that the safety remains through all future pivots.
[Crossposted from twitter]
In other words:
My fuzzy intuition would be to reject (step 2 of your argument) if we accept determinism. And my actually philosophical position would be that these types of questions are not very useful and generally downstream of more fundamental confusions.
I think it's pretty bizarre that despite the fact that LessWrongers are usually acutely aware of the epistemic downsides of being an activist, they seem to have paid relatively little attention to this in their recent transition to activism.
FWIW I'm the primary organizer of PauseAI UK and I've thought about this a lot.
Very little liquidity though
Wow that feels almost cruel! Seems to change the Claude personality substantially?
This is a reference to Eliezer, right? I really don't understand why he's on Twitter so much. I find it quite sad to see one of my heroes slipping into the ragebait Twitter attractor.