[Linkpost] Large Language Models Converge on Brain-Like Word Representations

Bogdan Ionut Cirstea

36 [Linkpost] Large Language Models Converge on Brain-Like Word Representations

11th Jun 2023

1 min read

36

This is a linkpost for https://arxiv.org/abs/2306.01930

One of the greatest puzzles of all time is how understanding arises from neural mechanics. Our brains are networks of billions of biological neurons transmitting chemical and electrical signals along their connections. Large language models are networks of millions or billions of digital neurons, implementing functions that read the output of other functions in complex networks. The failure to see how meaning would arise from such mechanics has led many cognitive scientists and philosophers to various forms of dualism -- and many artificial intelligence researchers to dismiss large language models as stochastic parrots or jpeg-like compressions of text corpora. We show that human-like representations arise in large language models. Specifically, the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging.

Language ModelsNatural AbstractionPhilosophy of LanguageAI

Frontpage

36

Mentioned in

322Against Almost Every Theory of Impact of Interpretability

22Inducing human-like biases in moral reasoning LMs

[Linkpost] Large Language Models Converge on Brain-Like Word Representations

9Vladimir_Nesov

3Bogdan Ionut Cirstea

5Bogdan Ionut Cirstea

1Ilio

1Bogdan Ionut Cirstea

1Ilio

4Bogdan Ionut Cirstea

1Ilio

New Comment

12 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:37 AM

[-]Vladimir_Nesov1y96

Prediction/compression seems to be working out as a path to general intelligence, implicitly representing situations in terms of their key legible features, making it easy to formulate policies appropriate for a wide variety of instrumental objectives, in a wide variety of situations, without having to adapt the representation for particular kinds of objectives or situations. To the extent brains engage in predictive processing, they are plausibly going to compute related representations. (This doesn't ensure alignment, as there are many different ways of making use of these features, of acting differently in the same world.)

[-]Bogdan Ionut Cirstea1y30

Yes, predictive processing as the reason behind related representations has been the interpretation in a few papers, e.g. The neural architecture of language: Integrative modeling converges on predictive processing. There's also some pushback against this interpretation though, e.g. Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data.

[-]Max H1y93

A possible explanation: both brains and LLMs are somehow solving the symbol grounding problem. It may be that the most natural solutions to this problem share commonalities, or even that all solutions are necessarily isomorphic to each other.

Anyone who has played around with LLMs for a while can see that they are not just "stochastic parrots", but I think it's a pretty big leap to call anything within them "human-like" or "brain-like".

If an AI (perhaps a GOFAI or just an ordinary computer program) implements addition using the standard algorithm for multi-digit addition that humans learn in elementary school, does that make the AI human-like? Maybe a little, but it seems less misleading to say that the method itself is just a natural way of solving the same underlying problem. The fact that AIs are becoming capable of solving more complex problems that were previously only solvable by human brains seems more like a fact about a general increase in AI capabilities, than a result of AI systems getting more "brain-like".

To say that any system which solves a problem via similar methods to humans is brain-like, seems like it is unfairly privileging the specialness / uniqueness of the brain. Claims like that (IMO wrongly) suggestively imply that those solutions somehow "belong" to the brain, simply because that is where we first observed them.

[-]O O1y6-2

To say that any system which solves a problem via similar methods to humans is brain-like, seems like it is unfairly privileging the specialness / uniqueness of the brain. Claims like that (IMO wrongly) suggestively imply that those solutions somehow "belong" to the brain, simply because that is where we first observed them.

The brain isn't exactly some arbitrary set of parameters picked from a mindspace, it's the most statistically likely general intelligence to form from evolutionary mechanisms on a mammalian brain. Presumably the processes it uses are the simplest to build bottom up so the claim is misguided but it isn't entirely wrong.

[-]Noosphere891y63

Anyone who has played around with LLMs for a while can see that they are not just "stochastic parrots", but I think it's a pretty big leap to call anything within them "human-like" or "brain-like".

To a large extent, this describes my new views on LLM capabilities, too, especially transformers. Missing important aspects of human cognition, but it's not a useless stochastic parrot, like some of the more dismissive people claim it is.

[-]jakej1y42

To me, it really looks like brains and LLMs are both using embedding spaces to represent information. Embedding spaces ground symbols by automatically relating all concepts they contain, including the grammar for manipulating these concepts.

[-]Bogdan Ionut Cirstea1y51

There are some papers suggesting this could indeed be the case, at least for language processing e.g. Shared computational principles for language processing in humans and deep language models, Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain.

[-]Ilio1y1-6

Big achievement, even if nobody should be surprised (it’s been known for vision for a decade or so).

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003963

@anyone To those who believe a future AGI might pick its value at random: don’t you think this result suggests it should restricts its pick to something human langage and visiospatial cognition push for?

[-]Bogdan Ionut Cirstea1y10

Yes, there are similar results in a bunch of other domains, including vision, see for a review e.g. The neuroconnectionist research programme.

I wouldn't interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs, some/many of which are probably relevant for alignment.

[-]Ilio1y10

wouldn't interpret this as necessarily limiting the space of AI values, but rather (somewhat conservatively) as shared (linguistic) features between humans and AIs

I fail to see how the latter could arise without the former. Would you mind to connect these dots?

[-]Bogdan Ionut Cirstea1y43

AIs could have representations of human values without being motivated to pursue them; also, their representations could be a superset of human representations.

(In practice, I do think having overlapping representations with human values likely helps, for reasons related to e.g. Predicting Inductive Biases of Pre-Trained Models and Alignment with human representations supports robust few-shot learning.)

[-]Ilio1y10

Indeed their representations could form a superset of human representations, and that’s why it’s not random. Or, equivalently, it’s random but not under uniform prior.

(Yes, these further works are more evidence for « it’s not random at all », as if LLMs were discovering (some of) the same set of principles that allows our brains to construct/use our language rather than creating completely new cognitive structures. That’s actually reminiscent of alphazero converging toward human style without training on human input.)

Moderation Log