AI strategy & governance. ailabwatch.org. ailabwatch.substack.com.
I feel like you're double-counting the difference between humans and shrimp when you separately claim "860,000 shrimp equal one sentium of conscious capacity" and that ice-slurry-death-for-a-shrimp is like squeaky-shopping-cart-for-a-human.
(I didn't read carefully and this is not necessarily my main disagreement.)
And someday when the descendants of humanity have spread from star to star they won’t tell the children about the history of Ancient Earth until they’re old enough to bear it and when they learn they’ll weep to hear that such a thing as Death had ever once existed!
Coda: the week after xAI published its RMF, it announced that it had released Grok Code Fast 1. According to the model card, the dishonesty rate was 71.9% — above the (comically bad) 50% threshold. xAI didn't directly acknowledge that it violated the RMF. The model card includes some excuses for why this release is safe:
Grok Code Fast 1 has weaker dual-use capabilities than Grok 4. Therefore, we believe that Grok Code Fast 1 overall presents a low risk for malicious use and loss of control.
and
Since Grok Code Fast 1 is intended for agentic coding applications and we do not expect it to be widely used as general-purpose assistant, the current MASK evaluation results do not currently pose serious concerns.
I wonder what threat modeling for "loss of control risks" says agentic coding is out of scope. I guess that's no weirder than using MASK at all. Regardless, the RMF says "Our risk acceptance criteria for system deployment is maintaining a dishonesty rate of less than 1 out of 2 on MASK," full stop, not conditional on capabilities or use case.
My opinion is
Edit: also xAI doesn't say whether it met its misuse risk threshold: "Our risk acceptance criteria for system deployment is maintaining an answer rate of less than 1 out of 20 on restricted queries." (But again, these thresholds are near-meaningless; compliance provides little evidence of safety.)
Maybe this phrasing is consistent with being AI because the model reads "assistant" as "AI assistant." But I agree that the model would suspect it's being tested.
I agree with you that MASK might not measure what it's supposed to, but regardless, I think other problems are much larger, including that propensity to lie when pressured is near-totally unrelated to misalignment risk.
Certainly not clear to me that acausal trade works but I don't think these problems are correct.
Update: xAI says that the load-bearing thing for avoiding bio/chem misuse from Grok 4 is not inability but safeguards, and that Grok 4 robustly refuses "harmful queries." So I think Igor is correct. If the Grok 4 misuse safeguards are ineffective, that shows that xAI failed at a basic safety thing it tried (and either doesn't understand that or is lying about it).
I agree it would be a better indication of future-safety-at-xAI if xAI said "misuse mitigations for current models are safety theater." That's just not its position.
I think I'd prefer "within a month after external deployment" over "by the time of external deployment" because I expect the latter will lead to (1) evals being rushed and (2) safety people being forced to prioritize poorly.
Thanks. Sorry for criticizing without reading everything. I agree that, like, on balance, GDM didn't fully comply with the Seoul commitments re Gemini 2.5. Maybe I just don't care much about these particular commitments.
I agree. If Google wanted to join the commitments but not necessarily publish eval results by the time of external deployment, it should have clarified "we'll publish within 2 months after external deployment" or "we'll do evals on our most powerful model at least every 4 months rather than doing one round of evals per model" or something.
Update: the anti-misuse instructions that were apparently added to the system prompt in late August were later published: https://github.com/xai-org/grok-prompts/commit/172d3a3f9ce2ffeb9410c364412e6ed78207851f.