I like the basic idea of a pause in training increasingly powerful AIs. Yet I'm quite dissatisfied with any specific plan that I can think of.

AI research is proceeding at a reckless pace. There's massive disagreement among intelligent people as to how dangerous this is.

I participated in a forecasting-persuasion tournament on X-risk topics last summer (see also a writeup from a participant). Not only was there no hint of a consensus among superforecasters about whether AGI will be dangerous if developed. It was hard to even make progress on agreeing whether to expect human-level AI in less than a decade versus not this century.

I see two moderately promising paths to safe development of smarter-than-human AI: ensuring that AIs have relatively narrow goals / little agency (Drexler's approach), and understanding AIs better (interpretability). [That's not a complete list of approaches that are worth pursuing, they're the approaches whose value are clearest to me.]

I see little hope of a good agreement to pause AI development unless leading AI researchers agree that a pause is needed, and help write the rules. Even with that kind of expert help, there's a large risk that the rules will be ineffective and cause arbitrary collateral damage.

AI researchers understand better than almost all outside commentators what AIs can do and where AI research is heading. AI researchers have biases which mean they're giving inadequate weight to possible harm. But they still have a clear desire to avoid scenarios where all humans die. The alternative to having AI researchers make safety decisions seems to be having politicians make those decisions. I don't see an argument that those politicians have less harmful biases, and I'm confident they have less understanding of the risks.

The world does not have the competence to collectively agree on an ideal approach. Something like trusting the best AI labs seems to be the best we can do. But instead of blind trust, let's nudge them to be more cautious.

I recommend that for at least the next few months we focus less attention on pausing development, and more attention on asking AI researchers how they're going to decide what qualifies as a safe strategy for handling AI. What kind of evidence would persuade them to pause development? What are they doing to look for that evidence?

There are plenty of small steps we could take to slightly improve our odds.

We could organize social pressure against dangerous companies that are eager to make AIs more agentic. Who funded LangChain? Apparently Benchmark. Should we describe them as pioneering a new class of funding: devil investing?

OpenAI: why should we think you're serious about safety while you're apparently providing API access to Auto GPT?

We can praise research that helps detect deception, while criticizing anything that encourages an AI to hide its beliefs.

AI labs would continue to make (possibly slower) progress if they shifted resources away from throwing more compute at AIs, toward more interpretability research, so that we can answer questions like:

  • What goals does it have?
  • Is it deceiving us?

The worst part of Eliezer's call for stopping AI is where he belittles interpretability researchers with his complaint about "giant inscrutable arrays of fractional numbers".

See also SMBC.


New Comment

New to LessWrong?