This is a linkpost for

Large language models show human-like performance in knowledge extraction, reasoning and dialogue, but it remains controversial whether this performance is best explained by memorization and pattern matching, or whether it reflects human-like inferential semantics and world knowledge. Knowledge bases such as WikiData provide large-scale, high-quality representations of inferential semantics and world knowledge. We show that large language models learn to organize concepts in ways that are strikingly similar to how concepts are organized in such knowledge bases. Knowledge bases model collective, institutional knowledge, and large language models seem to induce such knowledge from raw text. We show that bigger and better models exhibit more human-like concept organization, across four families of language models and three knowledge graph embeddings.


We present a series of experiments with four families of LLMs (21 models), as well as three knowledge graph embedding algorithms. Using three different methods, we compare the vector spaces of the LLMs to the vector spaces induced by the graph embedding algorithms. (This amounts to a total of 220 experiments.) We find that the vector spaces of LLMs within each family become increasingly structurally similar to those of knowledge graph embeddings. This shows that LLMs partially converge on human-like concept organization, facilitating inferential semantics [19].3 The sample efficiency of this convergence seem to depend on a number of factors, including polysemy and semantic category. Our findings have important implications. They vindicate the conjecture in [19] that LLMs exhibit inferential semantics, settling the research question presented in [16], cited above. This means that LLMs partially converge toward human-like concept representations and, thus, come to partially ‘understand language’, in a non-trivial sense. We speculate that the human-like conceptual organization is also what facilitates out-of-distribution inferences in LLMs.

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 4:04 AM

As a sanity check it would have been nice if they showed Procrustes and RDM results with the vocabulary items randomly permuted (if you can align with randomly permuted tokens that's a bad sign).

Also since they compute the RDM using Euclidean distances instead of e.g. inner products, all the elements are non-negative and the cosine similarity would be non-negative even for completely unrelated embeddings. That doesn't necessarily invalidate their scaling trends but it makes it a bit hard to interpret.

I think there are much better papers on this topic, such as this one.