On Dwarkesh Patel’s Podcast With Richard Sutton

[-]ryan_b2mo102

It feels to me like Sutton is too deep inside the experiential learning theory. When he says there is no evidence for imitation, this only makes sense if you imagine it strictly according to the RL theory he has in mind. He isn't applying the theory to anything; he is inside the theory and interpreting everything according his understanding of it.

It did feel like there was a lot of talking past one another when Dwarkesh was clearly talking about the superintelligent behaviors everyone is interested in (doing science, math, and engineering) as his model for intelligence, and Sutton is blowing all of this off only to articulate quite late in the game that his perspective is that human infants are his model for intelligence. If this was cleared up early, it would probably have been more productive.

I have always found the concept of a p-zombie kind of silly, but now I feel like we might really have to investigate the question of an approximate i-zombie: if we have a computer than can output anything an intelligent human can, but we stipulate that the computer is not intelligent....and so on and so forth.

On the flip side, it feels kind of like a waste of time. Who would be persuaded by such a thing?

[-]lc2mo88

Obviously the training data of LLMs contains more than human dialogue, so the claim that the pretrained LLMs are "strictly imitating humans" is clearly false. I don't know why this was never brought up.

[-]David Davidson1mo10

It's neither obvious nor clear to me. Who wrote the rest of their training data, besides us oh-so-fallible humans? What percentage of the data does this non-human authorship constitute?

[-]zyansheep1mo10

Modern multimodal LLMs are trained not just on text data, but also on images and video. In the finetuning stage for reasoning models, they are trained not to predict human reasoning, but to do reasoning themselves to accomplish a binary goal. Not sure about exact percentages but in terms of capabilities, they can recognize images very well and generate very realistic images and videos, to the point where many people can't tell them apart.

[-]David Davidson1mo*10

That gives them more different abilities; I don't think it constitutes a fundamental change to their way of thinking or that it makes them more intelligent.
(It doesn't improve their performance on text based problems significantly.)
Because it is just doing the ~same type of "learning" on a different type of data.
This doesn't make them able to discuss say abiogenesis or philosophy with actual critical human-like thought. In these fields they are strictly imitating humans.
As in, imagine you replaced all the learning data regarding abiogenesis with plausible-sounding but subtly wrong theories. The LLM would simply slavishly repeat these wrong theories, wouldn't it?

[-]dmac_931mo20

Sutton seems to confuse intelligence with life. These are distinctly different concepts. Compare LLMs and bacteria: LLMs are intelligent but not alive, bacteria are alive but not intelligent. Bacteria have goals, such as consuming food and avoiding hazards, and bacteria take directed action to accomplish their goals.

[-]TAG2mo20

John McCarthy’sdefinition that intelligence is the computational part of the ability to achieve goals

.....needs to be taken broadly -- the goals don't have to be the intelligent agents own. An intelligent servant is quite conceivable

[-]Jan_Kulveit2mo20

Probably the most sensible response to the interview I've seen so far.

Also I'll probably start referencing this when people argue whether thinking in active inference frame has any advantage over thinking in the RL frame. Clearly it does: it's way easier to see what happens if you drop the "reward" term Sutton is imagining as necessary and keep just the prediction error minimization terms. You still get intelligent systems, they still learn powerful abstractions (because they need to compress data), they still learn a generative world model. (It's probably good the active inference frame is antimemetic in the orthodox RL crowd)

[-]Joachim Bartosik2mo20

Almost all feedback is noisy because almost all outcomes are probabilistic.

Yes but signal / noise ratios matter a lot.

Language is somewhat optimized to pick up signal and skip noise. For example "red" makes it easier to pick ripe fruit, "grue" doesn't really exist because its useless, "expired" is a real concept because it's useful.

It also has some noise added. For example putting (murderers and jay wakers) in a category "criminal" to politically oppose something.

Also not being exposed to the kind of noise that's present IRL might be an issue when you start to deal with IRL (sometimes people say something like "just do the max EV action" is a good enough plan)

I'm pretty sure this is some obstacle for LLMs, I'm pretty sure its something that can be overcome, I'm very unsure how much this matters.

[-]zby1mo10

Continues Learning is indeed a big deal.
Continues learning does not mean you need a continuously learning LLM - this can be a property of a system.

But:

The bitter lesson says a data‑scaled meta‑learner that learns to learn will beat our hand‑engineered scaffolds.

LESSWRONG
LW

LESSWRONG
LW

59

On Dwarkesh Patel’s Podcast With Richard Sutton

59

59

Sutton Says LLMs Are Not Intelligent And Don’t Do Anything

Humans Do Imitation Learning

The Experimental Paradigm

Current Architectures Generalize Poorly Out Of Distribution

Surprises In The AI Field

Will The Bitter Lesson Apply After AGI?

Succession To AI