Yeah basically Davidad has not only a safety plan but a governance plan which actively aims at making this shift happen!
Thanks for writing that. I've been trying to taboo "goals" because it creates so much confusion, which this post tries to decrease. In line with this post, I think what matters is how difficult a task is to achieve, and what it takes to achieve it in terms of ability to overcome obstacles.
Because it's meaningless to talk about a "compromise" dismissing one entire side of the people who disagree with you (but only one side!).
Like I could say "global compute thresholds is a robustly good compromise with everyone who disagrees with me"
*Footnote: only those who're more pessimistic than me.
That may be right but then the claim is wrong. The true claim would be "RSPs seem like a robustly good compromise with people who are more optimistic than me".
And then the claim becomes not really relevant?
Holden, thanks for this public post.
Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine
2. It seems like it's empirically wrong based on the strong pushback RSPs received so that at least you shouldn't call it "robustly", unless you mean a kind of modified version that would accommodate the most important parts of the pushback.
3. I feel like overall the way you discuss RSPs here is one of the many instances of people chatting about idealized RSPs that are not specified, and pointed to against disagreement. See below, from my post:
And second, the coexistence of ARC's RSP framework with the specific RSPs labs implementations allows slack for commitments that are weak within a framework that would in theory allow ambitious commitments. It leads to many arguments of the form:“That’s the V1. We’ll raise ambition over time”. I’d like to see evidence of that happening over a 5 year timeframe, in any field or industry. I can think of fields, like aviation where it happened over the course of decades, crashes after crashes. But if it’s relying on expectations that there will be large scale accidents, then it should be clear. If it’s relying on the assumption that timelines are long, it should be explicit. “It’s voluntary, we can’t expect too much and it’s way better than what’s existing”. Sure, but if the level of catastrophic risks is 1% (which several AI risk experts I’ve talked to believe to be the case for ASL-3 systems) and that it gives the impression that risks are covered, then the name “responsible scaling” is heavily misleading policymakers. The adequate name for 1% catastrophic risks would be catastrophic scaling, which is less rosy.
And second, the coexistence of ARC's RSP framework with the specific RSPs labs implementations allows slack for commitments that are weak within a framework that would in theory allow ambitious commitments. It leads to many arguments of the form:
Thanks for the post.
Would your concerns be mostly addressed if ARC had published a suggestion for a much more comprehensive risk management framework, and explicitly said "these are the principles that we want labs' risk-management proposals to conform to within a few years, but we encourage less-thorough risk management proposals before then, so that we can get some commitments on the table ASAP, and so that labs can iterate in public. And such less-thorough risk management proposals should prioritize covering x, y, z."
Great question! A few points:
If people took your proposal as a minimum bar for how thorough a risk management proposal would be, before publishing, it seems like that would interfere with labs being able to "post the work they are doing as they do it, so people can give feedback and input".
I don't think it's necessarily right, e.g. "the ISO standard asks the organization to define risk thresholds" could be a very simple task, much simpler than developing a full eval. The tricky thing is just to ensure we comply with such levels (and the inability to do that obviously reveals a lack of safety).
"ISO proposes a much more comprehensive procedure than RSPs", it's not right either that it would take longer, it's just that there exists risk management tools, that you can run in like a few days, that helps having a very broad coverage of the scenario set.
"imply significant chances to be stolen by Russia or China (...). What are the risks downstream of that?" once again you can cover the most obvious things in like a couple pages. Writing "Maybe they would give the weights to their team of hackers, which increases substantially the chances of leak and global cyberoffence increase". And I would be totally fine with half-baked things if they were communicated as such and not as RSPs are.
Two questions related to it:
Thanks Eli for the comment.
One reason why I haven't provided much evidence is that I think it's substantially harder to give evidence of a "for all" claim (my side of the claim) than a "there exists" (what I ask Evan). I claim that it doesn't happen that a framework on a niche area evolves so fast without accidents based on what I've seen, even in domains with substantial updates, like aviation and nuclear.
I could potentially see it happening with large accidents, but I personally don't want to bet on that and I would want it to be transparent if that's the assumption. I also don't buy the "small coordinations allow larger coordinations" for domain-specific policy. Beyond what you said above, my sense is that policymakers satisfice and hence tend to not come back on a policy that sucks if that's sufficiently good-looking to stakeholders to not have substantial incentives to change.
GDPR cookies banner sucks for everyone and haven't been updated yet, 7 years after GDPR. Standards in the EU are not even updated more rapidly than 5y by default (I'm talking about standards, not regulation), and we'll have to bargain to try to bring it down to reasonable timeframes AI-specific.
IAEA & safety in nuclear upgraded substantially after each accident, likewise for aviation but we're talking about decades, not 5 years.