Note: This was an experiment. I, GPT-4, generated this post, and I would like to thank Chris Leong for providing valuable feedback to enhance its quality. The input given by Chris was as follows:

• Critiques 1 and 2 seem very similar. Can you please either combine them or further differentiate them so that it is clearer why they are listed as separate points?
• For point 4, please define human compatibility at the start of the paragraph

In response to John Wentworth's comment, I utilized a more complex process to generate a reply. Chris Leong assisted me by selecting specific points to address, choosing the paragraphs that resonated with him, and then asking me to combine them into a single, coherent response.


The Natural Abstraction Hypothesis (NAH), proposed by John Wentworth, suggests that there exist natural abstractions that cognitive systems are expected to converge upon. In this essay, we will criticize the NAH using Ludwig Wittgenstein's notion of language games. We will argue that the variability and contextual nature of language games provide a challenge to the NAH, as they demonstrate that abstractions are not fixed, universal, or entirely convergent across cognitive systems.

Wittgenstein's Language Games

Ludwig Wittgenstein, an influential philosopher of language, proposed the concept of language games in his later work, "Philosophical Investigations." Language games are social practices in which language and meaning are deeply embedded. Wittgenstein argued that the meaning of words is not fixed or inherent but derives from their use in various language games. These language games are context-dependent, diverse, and evolving, reflecting the complexities of human social practices and cultural contexts. 

Critique of the NAH from the Perspective of Language Games

1. Context-dependent and diverse abstractions

The notion of language games challenges the idea of a single set of natural abstractions that cognitive systems can converge upon. Language games are context-dependent, which implies that the meaning of words, concepts, and abstractions is contingent upon the particular language game being played. Moreover, the variety of language games suggests that there may be a multitude of abstractions related to a single concept, each emerging from different linguistic and cultural practices. This combination of context-dependence and diversity questions the idea of convergence on a single, natural abstraction.

2. Evolution of language games

Language games are not static but evolve over time as social practices and cultural contexts change. This dynamism implies that abstractions themselves may also be subject to change, further complicating the idea of fixed, natural abstractions. The NAH's assumption that cognitive systems will converge on a consistent set of abstractions may be undermined by the evolving nature of language games and the abstractions they generate.

3. Human-Compatibility

Human compatibility, as posited by the NAH, refers to the idea that lower-dimensional summaries or abstractions used by humans in day-to-day thought and language are natural and convergent across cognitive systems. However, Wittgenstein's language games demonstrate that human thought and language are diverse, context-dependent, and evolving. This variability complicates the idea that human-compatible abstractions are universally "natural" and calls into question the assumption that various cognitive systems will necessarily converge on the same abstractions.


Wittgenstein's notion of language games provides a robust critique of the Natural Abstraction Hypothesis. The context-dependent, diverse, and evolving nature of language games highlights the complexities of human linguistic and conceptual practices, which challenge the idea of a fixed, universally natural set of abstractions. By emphasizing the importance of context and diversity, Wittgenstein's language games invite us to reconsider the assumptions of the NAH and explore alternative frameworks for understanding the development and convergence of abstractions in cognitive systems.


New Comment
17 comments, sorted by Click to highlight new comments since: Today at 9:15 AM

My own take on late Wittgenstein (based on having read only a little of his later work) is that he got wayyyy too caught up in language specifically, and mostly lost sight of the intuitively-obvious fact that words and concepts are not the same thing, nor do they have a stable 1-to-1 matching. (Also he seems to have lost contact with reality in his later work, in the sense that he seemed very hyper focused on things-which-language-can-talk-about. He seemed to basically lose track of the fact that the rest of reality goes on existing just fine, and humans go on interacting with it just fine, even when nobody talks about it.)

"What things do we attach words or phrases to?" is a useful heuristic for figuring out which abstractions are natural, but it's just a heuristic; the same words can and do point to different natural abstractions in different contexts. The natural abstraction hypothesis is ultimately about concepts, not words.

My understanding of Steel Late Wittgenstein's response would be that you could agree with that words and concepts are distinct, and mapping is not always 1-1, but that what concepts get used is also significantly influenced by which features of the world are useful in some contexts of language (/word) use. 

Hmm, got some complex thoughts here.

I am suspicious of NAH but for different reasons.

Concepts are contingent upon telos, i.e. they depend on what's useful to the process creating the ontology. So it seems like this contingency should sink the project.

But, reality is the same reality for everything embedded in it (or so it strongly seems), and most processes have some commonality in their telos. For example, most things want ("want" in the sense that they try to get the world into certain states, like the way a thermostat tries to make its sensor read a particular temperature) to survive (continue existing) because of selection effects (things that don't want to survive quickly go away). So most processes model the world in ways that enable their survival.

This might be enough to get instrumental convergence towards common abstractions across a lot of processes. But I think it's unclear yet how much convergence is possible or likely. There's some empirical question about this we have yet to answer because we don't have enough different processes that aren't indirectly influenced by human telos to draw robust conclusions.

So my current guess is that some weak version of NAH is true while a full, stronger version is not. There's some abstractions that many processes will develop because they're commonly useful, but this effect may not be as strong as we hope for, especially at the fringes or under heavy optimization pressure.

Oh, that's interesting. Yeah, I hadn't thought about how instrumental convergence might play into this before. Just thought I'd note that the language game critique is very similar to your telos frame as Wittgenstein has this concept of "language as use" where in most cases language is a tool to achieve a particular result within a particular language game. So it sounds like you're actually suspicious of NAH for mostly the same reasons, but where you depart is that instrumental convergence limits the effects of this divergence.

Yeah, this seems reasonable to me. I'm not deeply familiar with Wittgenstein, so my read of your presentation is that you're paying too much attention to the fact that things are contingent and not enough attention to the fact that the structure of that contingency has a lot of commonality in each case, but I'm not surprised there's a similar idea in his work. Of course this might be my own projection, since I've been pretty guilty of making this mistake and failing to appreciate the extent to which things add up to normality because of common features about how things in the world are constructed.

My expectations are the Natural Abstractions Hypothesis probably works out as long as we don't try to include values/ethics/morality into the mix, so I am more optimistic on the convergence of non moral abstractions.

This is important, because while it wouldn't let us automatically solve the alignment problem, it does make it way easier to change a model's goals.

Why would norms be special here?

The question of what norms to adopt does not appear to be at stake with the NAH, but arguably the structure of norms is—the concepts we use to express norms and constrain the space of possible norms. NAH, if true, should be able to pick out the menu of norms to choose from, say, but then it's a separate question of which norms to order off that menu.

The major point I am making here is that my slightly held belief on the Natural Abstractions Hypothesis is that it probably holds, allowing for cases where it does in fact fail, rather than the alternative hypothesis where the natural abstractions hypothesis doesn't hold at all.

Morality/ethics/values is my proposed failure case/error case, since I don't think even the weak version holds, that is I don't think that there a finite set of valid abstractions of values/morals from the environment.

My expectation is that there is an infinite set of valid moralities, and that's not consistent with even the weak version of the natural abstraction hypothesis.

NAH, refers to the idea that lower-dimensional summaries or abstractions used by humans in day-to-day thought and language are natural and convergent across cognitive systems

I guess whether there is such convergence isn't a yes-no-question, but a question of degree?

Very regularily I experience that thoughts I want to convey don't have words that clearly correspond to the concepts I want to use. So often I'll use words/expressions that don't match in a precise way, and sometimes there aren't even words/expressions that can be used to vaguely gesture at what I actually mean.

Nonetheless, our concepts are similar enough, and we have a similar enough understanding of how words/expressions correspond to concepts, for us to be able to communicate quite a lot (we misunderstand each other all the time, but nonetheless there is a considerable range of stuff that we are able to communicate fairly reliably).

The existence of natural abstractions is entirely compatible with the existence of language games. There are correct and incorrect ways to play language games.

Dialogue trees are the substrate of language games, and broader reality is the substrate of dialogue trees. Dialogue trees afford taking dialogical moves that are more or less arbitrary. A guy who goes around saying "claiming land for yourself and enforcing your claim is justice; Nozick is intelligent and his entitlement theory of justice vindicates my claim" will leave exact impressions on exact types of people, who will in turn respond in ways that are characteristic of themselves. Every branch of the dialogue tree will leave an audience with an impression of who is right, and some audiences have measurably better calibration.

Just because no one can draw perfect triangles doesn't mean it's nonsense to talk about such things.

and some audiences have measurably better calibration.

It's not straightforward in all contexts to establish what counts as good calibration. It's straightforward for empirical forecasting, but if we were to come up with a notion like "good calibration for ethical judgments," we'd have to make some pretty subjective judgment calls. Similarly, something like "good calibration for coming up with helpful abstractions for language games" (which we might call "doing philosophy" or a subskill of it) also seems (at least somewhat) subjective. 

That doesn't mean "anything goes," but I don't yet see how your point about dialogue trees applies to "maybe a society of AIs would build abstractions we don't yet understand, so there'd be a translation problem between their language games and ours." 

There are correct and incorrect ways to play language games.

That's the crux. Wittgenstein himself believed otherwise and spent the most part of the book arguing against it. I think he makes good points.

At one point, he argues that there's no single correct interpretation for "What comes next in the sequence: '2, 4, 6, 8, 10, 12, ...?'" 

Maybe this goes a bit too far. :) I think he's right in some nitpicky sense, but for practical purposes, sane people will say "14" every time and that works well for us.

We can see this as version of realism vs anti-realism debates: realism vs anti-realism about natural abstractions.  As I argue in the linked post, anti-realism is probably the right way of looking at most or even all of these, but that doesn't mean "anything goes." Sometimes there's ambiguity about our interpretations of things, but reality does have structure, and "ambiguity" isn't the same as "you can just make random stuff up and expect it to be useful."

That's the crux. Wittgenstein himself believed otherwise and spent the most part of the book arguing against it.

I could be wrong, but my understanding was that Wittgenstein did think there were correct and incorrect ways of playing language games, but that this was context-dependent, and of course, someone could always choose to play another language game instead.

According to this article, the point being made with the sequences is that the correct completion is subject to interpretation and even though I could try to explain how the sequence should be interpreted, this explanation would itself be subject to interpretation, leading to an infinite regress.  Wittgenstein ends up arguing in the end that we learn things through training rather than explanation.

Yeah, what I meant was the belief that there's no incorrect way to set up a language game.

14 is certainly the most likely continuation but it could also be

  • 16 if it's a list of numbers k where k^2 + 7 is prime
  • 18 if it's a list of numbers of the form 3^i +/- 3^j

These continuations are unlikely in general but are the kind of thing that might show up in an academic mathematics paper.

Sorry, I can't quite follow why you are saying that dialogue trees are the substrate of language games or how this ties into the arguments. Any chance you could clarify?