AI strategy & governance. ailabwatch.org. ailabwatch.substack.com.
Update: xAI says that the load-bearing thing for avoiding bio/chem misuse from Grok 4 is not inability but safeguards, and that Grok 4 robustly refuses "harmful queries." So I think Igor is correct. If the Grok 4 misuse safeguards are ineffective, that shows that xAI failed at a basic safety thing it tried (and either doesn't understand that or is lying about it).
I agree it would be a better indication of future-safety-at-xAI if xAI said "misuse mitigations for current models are safety theater." That's just not its position.
I think I'd prefer "within a month after external deployment" over "by the time of external deployment" because I expect the latter will lead to (1) evals being rushed and (2) safety people being forced to prioritize poorly.
Thanks. Sorry for criticizing without reading everything. I agree that, like, on balance, GDM didn't fully comply with the Seoul commitments re Gemini 2.5. Maybe I just don't care much about these particular commitments.
I agree. If Google wanted to join the commitments but not necessarily publish eval results by the time of external deployment, it should have clarified "we'll publish within 2 months after external deployment" or "we'll do evals on our most powerful model at least every 4 months rather than doing one round of evals per model" or something.
Some of my friends are signal-boosting this new article: 60 U.K. Lawmakers Accuse Google of Breaking AI Safety Pledge. See also the open letter. I don't feel good about this critique or the implicit ask.
Many people would be much less inclined to vote if it was fully public, so you would lose a lot of signal.
Disagree: the problem with rationalist cults is that they're wrong about [the effects of their behavior / whether what they're doing is optimific], not that what they're doing is Pascalian. (And empirically you should expect that subcommunities that really hurt their members do so for bad reasons.)
Yeah that's progress. But then you realize that ~all commitments are too weak/vague (not to mention non-standardized) and you notice that words-about-the-future are cheap and you realize you should focus on stuff other than commitments.
Certainly not clear to me that acausal trade works but I don't think these problems are correct.