Did an AI timelines debate with a friend who works at Google.

Link to debate

Preview

"In general, the trend has been once we see something happening, we as human researchers think - how do we put a human expert into this problem but the way it actually ends up being some sort of blackbox brute force kind of approach where you again throw the human researcher out of loop"

"That's the whole point of verification, right, where your models can run but then you have like no way of saying - are they running in the right direction or are they going in the place that they want to go. And even where they want to go, we don't have like a good way of setting a direction. Right now humans are giving those directions via problems or whatever."

"I'm saying the planning part is hard and the internal reward system is easy, I feel, relatively."

"You can create better priors only if you have access to reality or some grounding signal directly. It's just that even now if you can scale groundedness, I think the problem will be correct and I think we just have disagreements on what the scaleup of grounding is."

Video Sections

00:00 Preview
00:50 Intro
03:24 Samuel: P(ASI by 2030) = 0.25
14:05 Bhishma: P(ASI by 2030) = 0.05
26:28 Can AI pursue Goals Longterm?
34:55 RL Scaling Compute
44:03 Brute Force versus Human Ground Truth
53:26 Humans have Longterm Goals
1:05:41 Inner Reward, Outer Reward, Planning
1:13:51 Can Gradient Descent learn Complexity without Human-Curated Datasets or Rewards?
1:29:00 Can RL invent Multiple New World Models

Summary

Timelines

Samuel: P(ASI by 2030) = 0.25. Outside view of other experts, inside view - model scaling, RL scaling, serial speedup
Bhishma: P(ASI by 2030) = 0.05. Non-LLM ASI possible, no longterm goal pursuit, bad grounding, bad grounding means bad generalisation

Can AI pursue Goals Longterm

Samuel: Humans want AI to purse longterm goals, RL starting to pursue longterm goals. Priors based on past data: Problems that start to get solved get fully solved soon, brute force outperformed human experts
Bhishma: In theory agree, in practice RL/Inference compute required may be too high

RL Scaling Compute

Bhishma: $1M spent so far, risk appetite may not exists
Samuel: $1B risk appetite exists, but not spent yet, maybe technical bottleneck Brute Force versus Human Ground Truth
Samuel: Humans learn from bad grounding, pretraining datasets are bad grounding, good world models can make use of bad grounding. Better architectures may reduce compute requirements.
Bhishma: Hard to bootstrap initial good world models, human grounding is the bottleneck for this, generalisation requires grounding

Humans have longterm goals. Inner Reward, Outer Reward, Planning

Bhishma: Speed is not the bottleneck but grounding is. In humans, Amyglada drives neocortex to do longterm goal pursuit. In AI, we have built neocortex not amyglada.
Samuel: In humans, internal reward system done by limbic system, but planning done by neocortex. In AI, internal reward system easy to build because low info processing, planning hard to build.
Bhishma: Building good reward system is hard and may need human experts
Samuel: Outer reward system is simple, inner reward system is emergent not built
Bhishma: Pretraining dataset and RL reward signals are both grounding. Whether you use pretuning or RL, you may require human experts to do grounding.
Bhishma: RL early days, hard to forecast. Evolution used simple outer reward and emergent inner reward, but required lot of compute.

Can Gradient Descent learn Complexity without Human-curated Datasets or Rewards

Bhishma: To get ASI, either good inner rewards are emergent despite simple grounding, or we brute force search despite bad inner rewards and bad grounding, or some unknown unknown path.
Samuel: Chess/Starcraft have emergent inner rewards
Bhishma: Chess state space is small. Collapsing large state space of reality into manageable size requires good world models which requires grounding to build.
Bhishma: RLHF was a surrogate grounding provided by humans. Can we build GPT3.5 without it? Samuel: Probably yes, I predict good inner rewards can be emergent not human-curated.
Bhishma: Both of us could make more concrete predictions like this

Can RL invent Multiple New World Models?

Bhishma: AI today imitates human world models like child imitates parent. AI not like bacteria that can be thrown into new environment and learn to survive on its own.
Samuel: Let's discuss this in a specific domain. Bhishma: AI physicist deciding which lab experiments to do
Samuel: Once AI physicist has absorbed known world models for known data, it has hypothesise multiple new world models to explain unexplained data, propose experiments, learn from experiment runs. Can explain how transformer + RL might crack this.
Debate paused

LESSWRONG
LW

LESSWRONG
LW

6

Samuel x Bhishma - Superintelligence by 2030?

6

6

Link to debate

Preview

Video Sections

Summary