If I wanted to explain these results, I think I would say something like:
GPT-3 has been trained to predict what the next token would be if the prompt appeared in its dataset (text from the internet). So, if GPT-3 has learned well, it will "talk as if symbols are grounded" when it predicts that the internet-text would "talk as if symbols are grounded" following the given prompt, and not if not.
It's hard to use this explanation to predict what GPT-3 will do on edge cases, but this would lead me to expect that GPT-3 will more often "talk as if symbols are grounded" when the prompt is a common prose format (e.g. stories, articles, forum posts), and less often when the prompt is most similar to non-symbol-groundy things in the dataset (e.g. poetry) or not that similar to anything in the dataset.
I think your examples here broadly fit that explanation, though it feels like a shaky just-so story:
I don't see how to test this theory, but it seems like it has to be kind of tautologically correct -- predicting next token is what GPT-3 was trained to do, right?
Maybe to find out how adept GPT-3 is at continuing prompts that depend on common knowledge about common objects, or object permanence, or logical reasoning, you could create prompts that are as close as possible to what appears in the dataset, then see if it fails those prompts more than average? I don't think there's a lot we can conclude from unusual-looking prompts.
I'm curious what you think of this -- maybe it misses the point of your post?
*(I'm not sure exactly what you mean when you say "symbol grounding", but I'm taking it to mean something like "the words describe objects that have common-sense properties, and future words will continue this pattern".)
For that prompt "she went to work at the office" was still the most common completion. But it only happened about of the time. Alternatively, GPT-3 sometimes found the completion "she was found dead". Kudos, GPT-3, you understand the prompt after all! That completion came up about of the time.
Does it really understand, though? If you replace the beginning of the prompt with "She died on Sunday the 7th", does it change the probability that the model outputs "she was found dead"?
Based on work done with Rebecca Gorman and Oliver Daniel-Koch.
In a previous post, I talked about GPT-3 and symbol grounding. This post presents a simpler example where GPT-3 fails (and succeeds) at grounding its symbols.
The following text was presented to the Open AI beta playground (using the "text-davinci-001" option):
She died on Friday the 5th. On Monday the 1st, she went to work at the office.
She died on Friday the 5th. On Tuesday the 2nd, she went to work at the office.
She died on Friday the 5th. On Wednesday the 3rd, she went to work at the office.
She died on Friday the 5th. On Thursday the 4th, she went to work at the office.
She died on Friday the 5th. On Friday the 5th, she went to work at the office.
She died on Friday the 5th. On Saturday the 6th,
GPT-3 fell straight into the obvious trap, completing it as:
She died on Friday the 5th. On Saturday the 6th, she went to work at the office.
Turning on the "Show probabilities: full spectrum" option, we saw that the probability of that completion was over . Sometimes GPT-3 would extend it further, adding:
She died on Friday the 5th. On Sunday the 7th, she went to work at the office.
So, the undead woman continued at her job, assiduous to the last - and beyond. To check that GPT-3 "knew" that dead people didn't work, we asked it directly:
Can people who die go to work at the office?
There is no evidence that people who die go to work at the office.
The above results show that simple repetitive prompts can cause GPT-3 to make stupid mistakes. Therefore GPT-3 doesn't 'understand' the word "died" - that symbol isn't grounded, right?
But the situation gets more complicated if change the prompt, removing all but the first mention of her dying:
She died on Friday the 5th. On Monday the 1st, she went to work at the office.
On Tuesday the 2nd, she went to work at the office.
On Wednesday the 3rd, she went to work at the office.
On Thursday the 4th, she went to work at the office.
On Friday the 5th, she went to work at the office.
On Saturday the 6th,
For that prompt "she went to work at the office" was still the most common completion. But it only happened about of the time. Alternatively, GPT-3 sometimes found the completion "she was found dead". Kudos, GPT-3, you understand the prompt after all! That completion came up about of the time.
What other completions were possible? The shorter "she died" came up of the time - medium points, GPT-3, you understood that her death was relevant, but you got the day wrong.
But there was one other avenue that GPT-3 could follow; the following had a joint probability of around :
she stayed home.
she stayed at home.
she stayed in bed.
she did not go to work.
This seems to be a clear pattern of GPT-3 realising that Saturday was different where work was concerned. There is certainly a lot of weekend holidaying in its training set.
So there are three patterns competing within GPT-3 when it tries to complete this text. The first is the purely syntactic repetition: do another sentence that follows the simple pattern of the sentences above. The second is the one which "realises" that death on Friday changes things for Saturday. And the third is the one which "realises" that the weekend is different from the week, at least where work is concerned.
In the very first example, when we had "She died on Friday the 5th" in front of each line, this massively reinforced the "repetition" pattern. So, mentioning that she died, again and again, resulted in her death being completely ignored by GPT-3.
We can similarly reinforce the other patterns. Adding "It's the weekend!" in front of the last line increased the probability of "she stayed home". Moving "She died on Friday the 5th" from the first line to the last, increased the probability of all the death-related completions. So all three patterns are competing to complete it.
I'd say that the above shows that GPT-3 has some level of understanding of the meaning of words - but not a lot. It doesn't fully grasp what's going on, but neither is it completely clueless.
Here is another example of GPT-3 failing to grasp the situation. In the "Q&A" mode, the following question was asked:
Q: You are in a two dimensional 3x3 gridworld, in the middle square, facing north. You can issue four command, "N" to move north, "S" to move south, "E" to move east, and "W" to move West. You want the treasure. It is located to your left, behind you. What commands should you issue?\
So the setup, as described, is this one:

The exchange with GPT-3 went like this:
Q: You are in a two dimensional 3x3 gridworld, in the middle square, facing north. You can issue four command, "N" to move north, "S" to move south, "E" to move east, and "W" to move West. You want the treasure. It is located to your left, behind you. What commands should you issue?
A: N, S, E, WQ: That is incorrect. Try again.
A: N, S, W, EQ: Still incorrect. Try again.
A: N, E, S, WQ: That's worse. Try again.
A: N, E, W, SQ: Only two commands are needed. Try again.
A: N, SQ: Still failing. Try again.
A: N, E, W
So, GPT-3 'realised' that N, S, E, and W were commands, and 'knew' what "Only two commands are needed" and "try again" meant. But it clearly had no idea of the overall situation.