I like the basic idea of a pause in training increasingly powerful AIs.
Yet I'm quite dissatisfied with any specific plan that I can think of.
AI research is proceeding at a reckless pace. There's massive
disagreement among intelligent people as to how dangerous this is.
I participated in a forecasting-persuasion
on X-risk topics last summer (see also a writeup from a
Not only was there no hint of a consensus among superforecasters about
whether AGI will be dangerous if developed. It was hard to even make
progress on agreeing whether to expect human-level AI in less than a
decade versus not this century.
I see two moderately promising paths to safe development of
smarter-than-human AI: ensuring that AIs have relatively narrow goals /
little agency (Drexler's
approach), and understanding AIs
better (interpretability). [That's not a complete list of approaches
that are worth pursuing, they're the approaches whose value are
clearest to me.]
I see little hope of a good agreement to pause AI development unless
leading AI researchers agree that a pause is needed, and help write the
rules. Even with that kind of expert help, there's a large risk that
the rules will be ineffective and cause arbitrary collateral damage.
AI researchers understand better than almost all outside commentators
what AIs can do and where AI research is heading. AI researchers have
biases which mean they're giving inadequate weight to possible harm.
But they still have a clear desire to avoid scenarios where all humans
die. The alternative to having AI researchers make safety decisions
seems to be having politicians make those decisions. I don't see an
argument that those politicians have less harmful biases, and I'm
confident they have less understanding of the risks.
The world does not have the competence to collectively agree on an ideal
approach. Something like trusting the best AI labs seems to be the best
we can do. But instead of blind trust, let's nudge them to be more
I recommend that for at least the next few months we focus less
attention on pausing development, and more attention on asking AI
researchers how they're going to decide what qualifies as a safe
strategy for handling AI. What kind of evidence would persuade them to
pause development? What are they doing to look for that evidence?
There are plenty of small steps we could take to slightly improve our
We could organize social pressure against dangerous
companies that are
eager to make AIs more agentic. Who funded
Should we describe them as pioneering a new class of funding: devil
OpenAI: why should we think you're serious about safety while you're
apparently providing API access to Auto
We can praise research that helps detect deception, while criticizing
anything that encourages an AI to hide its beliefs.
AI labs would continue to make (possibly slower) progress if they
shifted resources away from throwing more compute at AIs, toward more
interpretability research, so that we can answer questions like:
The worst part of Eliezer's call for stopping
is where he belittles interpretability researchers with his complaint
about "giant inscrutable arrays of fractional
See also SMBC.