Memetic Judo #3: The Intelligence of Stochastic Parrots v.2

Max TK

A couple of thoughts:

I think many people making this argument reject brain physicalism, particularly a subset of premise 2, something like "all of experience/the mind is captured by brain activity"
Your example I don't think is convincing to the stochastic parrot people. It could just be mashup of two types of images the AI has already seen, smashed together. A more convincing proof is OthelloGPT, which stores concepts in the form of boards, states, and legal moves, despite only being trained on sequences of text tokens representing othello moves.

I don't think people who make this argument explicitly reject brain physicalism - they won't come to you straight up saying "I believe we have a God-given immortal soul and these mechanical husks will never possess one". However, if hard-pressed, they'll probably start giving out arguments that kind of sound like dualism (in which the machines can't possibly ever catch up to human creativity for reasons). Mostly, I think it's just the latest iteration of existential discomfort at finding out we're perhaps not so special after all, like with "the Earth is not the centre of the Universe" and "we're just descendants of apes, ruled by the same laws as all other animals".

That said, I think then an interesting direction to take the discussion would be "ok, let's say these machines can NEVER experience anything and that conscious experience is a requirement to be able to express certain feelings and forms of creativity. Do you think it is also necessary to prove mathematical theorems, make scientific discoveries or plan a deception?". Because in the end, that's what would make an AI really dangerous, even if its art remained eternally kinda mid.

[-]Max TK9mo30

Good point. I think I will add it later.

[-]Max TK9mo32

About point 1: I think you are right with that assumption, though I believe that many people repeat this argument without having really a stance on (or awareness of) brain physicalism. That's why I didn't hesitate to include it. Still, if you have a decent idea of how to improve this article for people who are sceptical of physicalism, I would like to add it.

About point 2: Yeah you might be right ... a reference to OthelloGPT would make it more convincing - I will add it later!

Edit: Still, I believe that "mashup" isn't even a strictly false characterization of concept composition. I think I might add a paragraph explicitly explaining that and how I think about it.

[-]Morpheus9mo33

Since I had only heard the term “stochastic parrot” by skeptics who obviously didn't know what they are talking about, I hadn't realized what a fitting phrase stochastic parrot actually is. One might even argue it's overselling language models, as parrots are quite smart.

[-]dr_s9mo20

Them: "Don't worry it's just a Randomized Raven."

Me: "GOD I HOPE NOT"

[-]Max TK9mo10

#parrotGang

[-]TAG9mo20

It has been proven that neural nets can approximate arbitrary functions.

Therefore it should in principle be possible for stochastic parrots to be generally intelligent.

if a stochastic parrot is a particular type of neural net, that doesn't follow.

By analogy, a Turing incomplete language can't perform some computations, even if the hardware it is running on can perform any.

[-]dr_s9mo41

There is no actual definition of stochastic parrot, it's just a derogatory definition to downplay "something that, given a distribution to sample from and a prompt, performs a kind of Markov process to repeatedly predict the most probable next token".

The thing that people who love to sneer at AI like Gebru don't seem to get (or willingly downplay in bad faith) is that such a class of functions also include a thing that if asked "write me down a proof of the Riemann hypothesis" says "sure, here it is" and then goes on to win a Fields medal. There are no particular fundamental proven limits on how powerful such a function can be. I don't see why there should be.

[-]TAG9mo2-1

If it did so that, it wouldn't be mostly by luck, not as the consequence of a reliable knowledge generating process. LLMs are stochastic, that's not a baseless smear.

As ever, "power" is a cluster of different things.

[-]dr_s9mo20

It would be absolutely the consequence of a knowledge generating process. You are stochastic too, I am stochastic, there is noise and quantum randomness and we can't ever prove that given the exact same input we'd produce the exact same output every time, without fail. And you can make an LLM deterministic, just set its temperature to zero. We don't do that simply because it makes their output more varied, fun and interesting, but also, it doesn't destroy the coherence of that output altogether.

Basically even thinking that "stochastic" is a kind of insult is missing the point, but that's what people who unironically use the term "stochastic parrot" mostly do. They're trying to say that LLMs are blind random imitators who thus are unable of true understanding and will always be, but that's not implied by a more rigorous definition of what they do at all. Heck, for what it matters, actual parrots probably understand a bit of what they say. I've seen plenty of videos and testimonies of parrots using certain words in certain non-random contexts.

[-]TAG9mo2-2

It would be absolutely the consequence of a knowledge generating process.

I said "reliable". A stochastic model is only incidentally a truth generator. Do you think it's impossible to improve on LLM s by making the underlying engine more tuned in to truth per se?

You are stochastic too, I am stochastic,

If it's more random than us, it's not more powerful than us.

nd you can make an LLM deterministic, just set its temperature to zero.

Which obviously ins't going to give you novel solutions to maths problems. That's trading off one kind of power against another.

Basically even thinking that “stochastic” is a kind of insult is missing the point, but that’s what people who unironically use the term “stochastic parrot” mostly do. They’re trying to say that LLMs are blind random imitators who thus are unable of true understanding and will always be, but that’s not implied by a more rigorous definition of what they do at all.

But the objection can be steelmanned, eg: "If it's more random than us, it's not more powerful than us."

[-]dr_s9mo20

Is it more random than us? I think you're being too simplistic. Probabilistic computation can be compounded to reduce the uncertainty to an arbitrary amount, and in some cases I think be more powerful than purely deterministic one.

At its core the LLM is deterministic anyway. It produces logits of belief on what should be the next word. We, too, have uncertain beliefs. Then the systems is set up in a certain way to turn those beliefs into text. Again, if you want to choose always the most likely answer, just set the temperature to zero!

[-]TAG9mo20

It has "beliefs" regarding which word should follow another, and any other belief, opinion or knowledge is an incidental outcome of that. Do you think it’s impossible to improve on LLM s by making the underlying engine more tuned in to truth per se?

[-]dr_s9mo20

No, I think it's absolutely possible, at least theoretically - not sure what would it take to actually do it of course. But that's my point, there exists somewhere in the space of possible LLMs a "always gives you the wisest, most truthful response" model that does exactly the same thing, predicting the next token. As long as the prediction is always that of the next token that would appear in the wisest, most truthful response!

[-]TAG9mo20

Which is different to predicting a token on the basis of the statistical regularities in the training data. An LLM that works that way is relatively poor at reliably outputting truth, so a version of the SP argument goes through.

[-]dr_s9mo20

I think for the limit of infinite, truthful training data, with sufficient abstraction, it would not be necessarily different. We too form our beliefs from "training data" after all, we're just highly multimodal and smart enough to know the distinction between a science textbook and a fantasy novel. An LLM doesn't have maybe that distinction perfectly clear - though it does grasp it to some point.

[-]TAG9mo20

We too form our beliefs from “training data”

There's no evidence that we do so based solely on token prediction, so that's irrelevant.

[-]dr_s9mo20

I just don't really understand in what way "token prediction" is anything less than "literally any possible function from a domain of all possible observations to a domain of all possible actions". At least if your "tokens" cover extensively enough all the space of possible things you might want to do or say.

[-]Max TK9mo10

I think a significant part of the problem is not the LLMs trouble of distinguishing truth from fiction, it's rather to convince it through your prompt that the output you want is the former and not the latter.

[-]Max TK9mo10

I don't really know what to make of this objection, because I have never seen the stochastic parrot argument applied to a specific, limited architecture as opposed to the general category.

Edit: Maybe make a suggestion of how to rephrase to improve my argument.

[-]TAG9mo20

Maybe make a suggestion of how to rephrase to improve my argument

Citation. Quote something somebody said.

[-]TAG9mo20

I have never seen the stochastic parrot argument applied to a specific, limited architecture

I've never seen anything else. According to wikipedia, the term was originally applied to LLMs.

The term was first used in the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell (using the pseudonym "Shmargaret Shmitchell").[4] ThBold text

[-]Max TK9mo10

LLMs use 1 or more inner layers, so shouldn't the proof apply to them?

[-]TAG9mo20

what proof?

[-]Max TK9mo30

Of the universal approximation theorem

[-]TAG9mo20

How are inner layers relevant?

[-]dr_s9mo20

LLMs are neural networks, neural networks are proven to be able to approximate any function to an arbitrary close degree, hence LLMs are able to approximate any function to an arbitrary close degree (given enough layers, of course).

[-][anonymous]9mo10

This argument is completely correct.

However, I will note a corollary I jump to. It doesn't matter how lame or unintelligent an AI system's internal cognition actually is. What matters if it can produce outputs that lead to tasks being performed. And not even all human tasks. AGI is not even necessary for AI to be transformative.

All that matters is that the AI system perform the subset of tasks related to [chip and robotics] manufacture, including all feeder subtasks. (so everything from mining ore to transport to manufacturing)

These tasks have all kinds of favorable properties that make them easier than the full set of "everything a human can do". And a stochastic parrot is obviously quite suitable, we already automate many of these tasks with incredibly stupid robotics.

So yes, a stochastic parrot able to auto-learn new songs is incredibly powerful.

[-]TAG9mo86

What matters if it can produce outputs that lead to tasks being performed.

What matters is it can reliably produce outputs that lead to tasks being performed.

[-]Max TK9mo1-2

Based on your phrasing I sense you are trying to object to something here, but it doesn't seem to have much to do with my article. Is this correct or am I just misunderstanding your point?

[-][anonymous]9mo-41

You are misunderstanding. Is English not your primary language? I think it's pretty clear.

I suggest rereading the first main paragraph. The point is there, the other 2 are details.

[-]Max TK9mo00

Usually between people in international forums, there is a gentlemen's agreement to not be condescending over things like language comprehension or spelling errors, and I would like to continue this tradition, even though your own paragraphs would offer wide opportunities for me to do the same.