I don't understand why we need to conditional on progress changing pace. At every point so far, models have been capable in the predicted ways, and we only manage to patch the risk afterwards. And we know that exponentials grow faster than humans can adapt, that's proven by examples basically everywhere. (Even if progress were linear, only fixing failures after they occur would be a dangerous pace of relative safety progress!)
Yes - I laud their transparency while agreeing that the competitive pressures and new model release means they are not being safe even relative to their own previously stated expectations for their own behavior.
There are various possible worlds with AI progress posing different risks.
In those worlds where a given capability level is a problem, we're not setting ourselves up to notice or react even after the harm materializes. The set of behaviors or events that we could be monitoring keep being spelled out, in the form of red lines. And then they happen. We're already seeing tons of concrete harms - what more do we need? Do you think things will change if there's an actual chemical weapons attack? Or a rogue autonomous replication? Or is there some number of people that need to die first?
It was put online a s a preprint years ago, then published here: https://www.cell.com/patterns/fulltext/S2666-3899(23)00221-0
a supermajority of people in congress
Those last two words are doing a supermajority of the work!
And yes, it's about uneven distribution of power - but that power gradient can shift towards ASI pretty quickly, which is the argument. Still, the normative concern that most humans lost control already stands.
The president would probably be part of the supermajority and therefore cooperative, and it might work even if they aren't.
We're seeing this fail in certain places in real time today in the US. But regardless, the assumption of correlation of preferences often fails, partly due to the power imbalances themselves.
Great points here! Strongly agree that strategic competence is a prerequisite, but at the same time, it accelerates risk; a moderately misaligned but strategically competent mild-ASI solving intent alignment for RSI would be far worse. On the other hand, if prosaic alignment is basically functional through the point of mild-ASI is better.
So overall I'm unsure which path is less risky - but I do think strategic competence matches or at least rhymes well with current directions for capabilities improvement, so I expect it to improve regardless.
Seems like even pretty bad automated translation would get you most of the way to functional communication - and the critical enabler is more translated text, which could be gathered and trained on given the current importance - I bet there are plenty of NLP / AIxHealth folks who could help if Canadian health folks asked.
The belief is fixable?
Because sure, we can prioritize corrigibility and give up on independent ethics overriding that, but even in safety, that requires actual oversight, which we aren't doing.
Step 1, Solve ethics and morality.
Step 2. Build stronger AI without losing the lightcone or going extinct.
Step 3. Profit.
There is one suggestion that people have made which would address the problem; be capable of stopping. Not even stopping, just making sure governments are capable of monitoring these systems and deciding to shut them down if they later find it to be necessary.
Short of that, again, I think we've proven that warnings in advance aren't going to work. We're over the line already.