Anthropic's Sabotage Report Has a Structural Blind Spot — Experimental Evidence from 810 Measurements
Anthropic's Pilot Sabotage Risk Report (released yesterday) identifies 9 risk pathways and evaluates 3 general claims against sabotage. As GradientDissenter's summary notes, this is the first time a model developer has made something like a safety case for a model. It is thorough and valuable. However, I believe it contains...
Feb 12-1