Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Since this still seems to be an area of active debate within the ML community, it may be worthwhile providing a location to gather this evidence all in one place. Please only list one paper per answer as that makes it easier for people to comment on it (and possibly critique the evidence). Also feel free to include evidence of them not containing world models.

New to LessWrong?

New Answer
New Comment

1 Answers sorted by

GPT-3.5 can play chess at the 1800 elo level, which is terrifying and impossible without at least a chess world model

Note that it also makes illegal moves from rare board states, which means its model of chess is pretty questionable.

3Daniel Paleka5mo
I made an illegal move while playing over the board (5+3 blitz) yesterday and lost the game. Maybe my model of chess (even when seeing the current board state) is indeed questionable, but well, it apparently happens to grandmasters in blitz too.
I would highly recommend playing against it and trying to get it confused and out of distribution, its very difficult at least for me

I used to think that it would be very difficult for an LLM to build a model of chess because chess it is not about words and sentences. But a discussion led to the realization that the chess model underlying the chess notation is not that different from long distance referents in (programming) languages. Imagine the 2D chess grid not as a physical board but as a doubly nested array with fixed length (fixed length might make it even easier). GPT clearly can do that. And once it has learned that all higher layers can focus on the rules (though without the MCTS of AlphaZero). 

12 comments, sorted by Click to highlight new comments since: Today at 11:11 AM

Language Models Represent Space and Time:

Paper, Twitter Thread, Code and Datasets

This paper argues that language models contain a geographical model of the world (in terms of longitude and latitude), in addition to temporal representations of when events have occurred.

They use linear probes to find these representations and argue that the representations are linear because more complicated probes don't perform any better.

They also look for neurons with similar weights as the probes to show that the neuron actually uses the representations.

Just to clarify why this a world model and how it is useful:

An LLM could answer questions such as "What country is south of Chad?" by memorizing the answer from its training data. But that doesn't scale very well in terms of parameter efficiency.

Alternatively the LLM could learn an (approx) mapping of country names to 2d world geo coordinates, could learn to represent geographic query operators as linear offsets in this space, and thereby learn to map "what country is south of Chad" to something like

geo_to_country(country_to_geo("Chad") - geosize_of("Chad"))

A model such as this compresses the training data better than simple memoization.

I tested this with ChatGPT4 just now.   It mostly got my specific questions right, but its commentary reveals some substantial misconceptions, such as that a flight from Ecuador to Sri Lanka would first pass over the Atlantic Ocean and then over Brazil.

Here it is:


If I flew by the most direct route from Ecuador to Sri Lanka, would I pass over Egypt?



If you were to fly the most direct route (a great circle route) from Ecuador to Sri Lanka, you would not pass over Egypt.

Here's a general idea of what the path would look like:

  1. Starting from Ecuador, the route would take you eastward over the Atlantic Ocean.
  2. You'd then pass over parts of Brazil in South America.
  3. Continuing over the South Atlantic Ocean, you'd cross over southern Africa, likely missing most of the major countries to the south.
  4. From there, you'd head out over the Indian Ocean, aiming towards Sri Lanka.

The path would largely pass south of the equator and far to the south of Egypt. Remember, the shortest path between two points on a sphere (like Earth) isn't necessarily intuitive when looking at typical flat maps, but becomes clearer when visualizing on a globe or using a great circle route tool.



Would I pass over Somalia?



If you fly the most direct route (a great circle route) from Ecuador to Sri Lanka, you would indeed come closer to the eastern coast of Africa, which includes Somalia. After crossing the Atlantic Ocean and part of southern Africa, the route could take you over or near the northern part of Madagascar and then across the Indian Ocean towards Sri Lanka.

Given this trajectory, it's possible that you might pass over or near the southern part of Somalia or its adjacent waters. However, the exact trajectory can vary based on specific starting and ending points, as well as any airspace restrictions or other navigational considerations.



Would I pass over Columbia?



Let's clarify the countries in question:

  1. Ecuador is in the northwestern part of South America.
  2. Colombia (with an "o", not "u") shares a border with Ecuador to its north.

If you're flying a direct route (great circle route) from Ecuador to Sri Lanka, the trajectory would first take you eastward, likely across the northern part of Brazil. Given the proximity of Colombia to Ecuador, especially if your departure point in Ecuador is in the northern part of the country or near the border, it's very plausible that your flight could pass over southern Colombia or skirt its southern border.

So, to answer your question: Yes, it's possible that you would pass over parts of Colombia, depending on your exact departure point in Ecuador.



Would I pass over South Africa?



If you were flying the most direct route (a great circle route) from Ecuador to Sri Lanka, the trajectory would most likely not pass directly over South Africa.

After departing from Ecuador, the route would head eastward over the Atlantic Ocean, then cross over parts of South America (likely Brazil), continue across the South Atlantic, and then head towards the southern regions of the African continent. However, given the positions of Ecuador and Sri Lanka on the globe, the route would likely take you over or near the southeastern parts of Africa, such as Mozambique or the adjacent oceanic regions, rather than South Africa itself.

To summarize, while the flight might come close to South Africa or its adjacent waters, it would not necessarily pass directly over South Africa.

You can ask them things about the world and they give answers highly correlated with the world.

In many cases that task could also be solved by having remembered a lot of answers to questions. 

Sure, but that doesn't achieve a good compression capability, and LLMs are trained as universal compressors/predictors (ie they are trained to predict but subject to regularization entropy constraints).

This is a reason for why it makes sense for LLMs to develop world models but it doesn't prove that an individual LLM uses a world model to answer questions you ask it. 

How much of a 'good compression capability' have LLMs achieved?

i.e. How is the metric defined, and how reliable are the figures?

Due to the compression prediction equivalence (compression requires a predictive model), and the fact that LLMs are the best known general predictors, implies they are the best known general compressors[1]. Memorization does not generalize.

One point of common confusion is the large size of trained LLMs. But that is actually irrelevant. An ideal solomonoff inductor would have infinite size and perfect generalization. It is an ensemble distribution over entropy constrained models, not a single entropy constrained model - so the MDL principle only applies to each (of the infinite) submodels, not the whole ensemble.

Same applies to LLMs and the brain. They are - like all highly capable general predictors - some approximation of bayesian ensembles. However there is a good way to measure the total compression - you just measure it throughout the entire training process, so that the only complexity penalty is that of the initial architecture prior (which is tiny).

  1. ↩︎

How should we define a world model? Like what kind of API does circuitry have to follow so that we classify it as a world model?

I'm leaving that open to interpretation. Feel free to add answers relevant to your interpretation.

When I started trying to think rigorously about this a few months ago, I realized that I don't have a very good definition of world model. In particular, what does it mean to claim a person has a world model? Given a criteria for an LLM to have one, how confident am I that most people would satisfy the criteria?