Does GPT-2 Understand Anything?

by Douglas Summers-Stay 5 min read2nd Jan 202023 comments


Some people have expressed that “GPT-2 doesn’t understand anything about language or reality. It’s just huge statistics.” In at least two senses, this is true.

First, GPT-2 has no sensory organs. So when it talks about how things look or sound or feel and gets it right, it is just because it read something similar on the web somewhere. The best understanding it could have is the kind of understanding one gets from reading, not from direct experiences. Nor does it have the kind of understanding that a person does when reading, where the words bring to mind memories of past direct experiences.

Second, GPT-2 has no qualia. This is related to the previous point, but distinct from it. One could imagine building a robotic body with cameras for eyes and microphones for ears that fed .png and .wav files to something like GPT-2 rather than .html files. Such a system would have what might be called experiences of the world. It would not, however, create an direct internal impression of redness or loudness, the ineffable conscious experience that accompanies sensation.

However, this is too high a bar to rule out understanding. Perhaps we should call the understanding that comes from direct personal experience “real understanding” and the kind that comes solely from reading with no connection to personal experience “abstract understanding.” Although I can’t “really understand” what it was like to fight in the Vietnam War (because I wasn’t there, man) I can still understand it in an abstract sense. With an abstract understanding, here are some things one can do:

• answer questions about it in one’s own words
• define it
• use it appropriately in a sentence
• provide details about it
• summarize it

Professional teachers distinguish between tests of knowledge (which can be handled by mere memorization) and tests of understanding, with the latter being more difficult and useful (see Bloom’s Taxonomy). Understanding requires connecting a new idea to ideas a student is already familiar with.

GPT-2 is able to pass many such tests of understanding. With an appropriate prompt (such as giving examples of what form the answer to a question should take) it is able to answer questions, define terms, use words appropriately in a sentence, provide details, and summarize.

This is understanding for most practical purposes. It shows that when GPT-2 uses a word, that word has the appropriate kinds of connections to other words. The word has been integrated into a large graph-like structure of relationships between what can reasonably be called concepts or ideas. When probabilities for the next token have been generated, it has a certain propensity for using a particular word; but if that word is artificially blocked, other ways of saying the same thing also have been activated and will be used instead. It is reasonable to interpret this as having an “idea” of what it “wants” to “say” and at some point the quotation marks are no longer helpful, and we may as well dispense with them.

Here is an example. I input the following prompt into GPT-2 1.5B, with top-k=10 sampling:

"Indiana Jones ducked as he entered the cave to avoid being decapitated." In this sentence, the word "decapitated" means

Here are the first 10 results (truncated after the first sentence):
• "to be cut down" as well as "to be slain."
• "to chop off".
• "to cut off one of the branches of a tree."
• "The captain of the ship was killed in the cave."
• "to cut off, cut off by decapitation."
• "cut off".
• "cut off."
• to be "sliced off."
• "to be killed," which is the same thing as "to be killed by the sword."
• to fall from high altitude or to be cut down.
• "to have a head chopped off."

The system has a strong notion that “decapitated" means “to cut off” and “to kill” but is less likely to mention that the word has anything to do with a head. So its concept of “decapitation” appears to be approximately (but not completely) right. When prompted to write a sentence using the word “decapitate,” the sentences the system usually generates are consistent with this, often being used in a way consistent with killing, but only rarely mentioning heads. (This has all gotten rather grisly.)

However, one shouldn't take this too far. GPT-2 uses concepts in a very different way than a person does. In the paper “Evaluating Commonsense in Pre-trained Language Models,” the probability of generating each of a pair of superficially similar sentences is measured. If the system is correctly and consistently applying a concept, then one of the two sentences will have a high probability and the other a low probability of being generated. For example, given the four sentences

1. People need to use their air conditioner on a hot day.
2. People need to use their air conditioner on a lovely day.
3. People don’t need to use their air conditioner on a hot day.
4. People don’t need to use their air conditioner on a lovely day.

Sentences 1 and 4 should have higher probability than sentences 2 and 3. What they find is that GPT-2 does worse than chance on these kinds of problems. If a sentence is likely, a variation on the sentence with opposite meaning tends to have similar likelihood. The same problem occurred with word vectors, like word2vec. “Black” is the opposite of “white,” but except in the one dimension they differ, nearly everything else about them is the same: you can buy a white or black crayon, you can paint a wall white or black, you can use white or black to describe a dog’s fur. Because of this, black and white are semantically close, and tend to get confused with each other.

The underlying reason for this issue appears to be that GPT-2 has only ever seen sentences that make sense, and is trying to generate sentences that are similar to them. It has never seen sentences that do NOT make sense and makes no effort to avoid them. The paper “Don't Say That! Making Inconsistent Dialogue Unlikely with Unlikelihood Training” introduces such an “unlikelihood objective” and shows it can help with precisely the kinds of problems mentioned in the previous paper, as well as GPT-2’s tendency to get stuck in endless loops.

Despite all this, when generating text, GPT-2 is more likely to generate a true sentence than the opposite of a true sentence. “Polar bears are found in the Arctic” is far more likely to be generated than “Polar bears are found in the tropics,” and it is also more likely to be generated than “Polar bears are not found in the Arctic” because “not found” is a less likely construction to be used in real writing than “found.”

It appears that what GPT-2 knows is that the concept polar bear has a found in relation to Arctic but that it is not very particular about the polarity of that relation (found in vs. not found in.) It simply defaults to expressing the more commonly used positive polarity much of the time.

Another odd feature of GPT-2 is that its writing expresses equal confidence in concepts and relationships it knows very well, and those it knows very little about. By looking into the probabilities, we can often determine when GPT-2 is uncertain about something, but this uncertainty is not expressed in the sentences it generates. By the same token, if prompted with text that has a lot of hedge words and uncertainty, it will include those words even if it is a topic it knows a great deal about.

Finally, GPT-2 doesn’t make any attempt to keep its beliefs consistent with one another. Given the prompt The current President of the United States is named, most of the generated responses will be variations on “Barack Obama.” With other prompts, however, GPT-2 acts as if Donald Trump is the current president. This contradiction was present in the training data, which was created over the course of several years. The token probabilities show that both men’s names have fairly high likelihood of being generated for any question of the kind. A person discovering that kind of uncertainty about two options in their mind would modify their beliefs so that one was more likely and the other less likely, but GPT-2 doesn't have any mechanism to do this and enforce a kind of consistency on its beliefs.

In summary, it seems that GPT-2 does have something that can reasonably be called “understanding” and holds something very much like “concepts” or “ideas” which it uses to generate sentences. However, there are some profound differences between how a human holds and uses ideas and how GPT-2 does, which are important to keep in mind.