Unit conversion, such as
"Fresno is 204 miles (329 km) northwest of Los Angeles and 162 miles (" -> 261 km)
"Fresno is 204 miles (329 km) northwest of Los Angeles and has an average temperature of 64 F (" -> 18 C)
"Fresno is 204 miles (" -> 329 km)
Results: 1, 2, 3. It mostly gets the format right (but not the right numbers).
This is an interesting one! It looks like there might be some very rough heuristics going on for the number part as well, e.g. the model knows the number in km is almost definitely 3 digits.
"Mike is large -> large is Mike
Bob is cute -> cute is"
Also works w/ numbers (but I had trouble getting it to reverse 3 digits at a time):
"3 6 -> 6 3
2 88 ->"
"1 + 1 = 0 + 2
2 + 2 = 0 + 4
3 + 3 = 0 +"
Which also worked when replacing 0 w/ "pig", but changing it to "df" made it predict " 5" as the answer, which I think it just wants to count up from the previous answer 4.
For each of the following, the model predicts a "." at the end.
I eat spaghetti, yet she eats pizza
I slept, for I was sleepy
I can eat, or I can sleep
I love my dog, and my dog loves me
I can neither eat, nor can I sleep
Some n-grams overpower this effect. In the above "yet she eats ice" will be followed by " cream". "I was sleepy, so I slept" will be followed by " in".
"She ate the cookies, cake" will be followed by "," and then " and".
[Note: the language modelling game and the gpt-2 small search tool a were very useful]
Thanks for contributing these! I'm not sure I understand the one about ignoring a zero: is the idea that it can not only do normal addition, but also addition in the format with a zero?
Pattern: ["<incomplete quoted statement>," <descriptor of speaker> said,] -> ["<completion of sentence following from previous quotation><...>]
Example: ["When the truth is replaced by silence," the Soviet dissenter said,] -> [ "it will be impossible to hold securely everything.] (prediction starts with [ "] ~71% of the time)
The next token will be [ "] ~45-70% of the time when the original quotation is obviously incomplete.
When the original quotation looks more like a complete sentence, the next token will be [ "] only ~5-20% of the time (see counterexample below).
Counter Example (initial quotation is a complete statement; in this case removed 'When'):
["The truth is replaced by silence," the Soviet dissenter said,] -> [adding that the TV show was a farcical] (prediction starts with [ "] only ~12% of the time)
Pattern: [from <member of numeric class> to] -> [ <different member of numeric class>]
Examples:
[from 1874 to] -> [ 1882]
[from March 34, 1999 to] -> [ May 12, 2004]
[from 5:40 am to] -> [ 8:00 am]
[from 30 degrees to] -> [ 100 degrees]
[from 89 to] -> [ 93]
[from 154 to] -> [ 195]
[from 12539 to] -> [ 13114]
[from 2,631,254,399 to] -> [ 3,021,133,526]
Maintains symmetry between plausible years/dates/times/temperatures. In the case of dates/times is heavily biased towards predicting a higher value after 'to' (as would be expected from the training corpus). Also maintains symmetry of number of digits in arbitrary numbers that don't fall into an obvious class, though this starts losing exactness past 5 digits (but still remains roughly symmetric). Interestingly, exactness of number of digits for larger numbers improves substantially when commas are added to the number (e.g. 1,000,000).
Pattern: [https://] -> [<syntactically valid and real-looking URL containing a domain, resource, sometimes query parameters, etc>]
Examples:
[https://] -> [www.parks.org/programs/]
[http://wowthisissocool] -> [380.blogspot.com/2015/03/]
[https://ibetthiswillgetqueryparams.com] -> [/submit?inc=false&type=Out]
Beyond being merely syntactically valid, common URL resource nesting patterns are observed, like the [/<year>/<month>/] pattern above, or [/<resource>/<id>].
Thanks for these! I love the 'from' -> 'to' one: it seems GPT-2 small clearly knows the rough ordering of numbers in various formats, although when I was playing with it and trying to get it to do addition in real life settings, it appears quite bad at actually knowing how numbers work.
"Either", "or" pairs in text.
Heuristic. If the word either appears in a sentence, wait for the comma and then add an " or".
What follows are a few examples. Note that the completion is just something I randomly come up with, the important part is the or. Using the webapp, GPT-2 puts a high probability (around 40%-60%) on the token " or".
"Either you take a left at the next intersection," -> or take a left after that.
"Either you go to the cinema," -> or you stay at home.
"Tonight I could either order some food," -> or cook something myself.
Counter example:
"Do you rather want to go to Portugal or Italy? Either" -> way is fine./one is fine. (GPT-2 puts a lot of probability on " way", and barely any on " or", which is correct).
Thanks! There are probably other grammatical structures in English that require a bit of an algorithmic thinking like this one as well.
I found some behaviors, but I'm not sure this is what you are looking for because the algorithm in both is quite simple. I'd appreciate feedback on them.
"If today is Monday, tomorrow is Tuesday. If today is Wednesday, tomorrow is" -> "Thursday"
"If today is Monday, tomorrow is Tuesday. If today is Thursday, tomorrow is" -> "Friday"
etc.
This also works with zero-shot prompting although the effect isn't as strong. eg:
"If today is Friday, tomorrow is" -> "Saturday"
"Lisa is great. I really like" -> "her"
"John is great. I really like" -> "him"
etc.
Some of Redwood’s current research involves finding specific behaviors that language models exhibit, and then doing interpretability to explain how the model does these behaviors. One example of this is the indirect object identification (IOI) behavior, investigated in a forthcoming paper of ours: given the input When John and Mary went to the store, Mary gave a flower to, the model completes John instead of Mary. Another example is the acronym generation task: given the input In a statement released by the Big Government Agency (, the model completes BGA).
We are considering scaling up this line of research a bunch, and that means we need a lot more behaviors to investigate! The ideal tasks that we are looking for have the following properties:
Examples
The following is a list of tasks that we have found so far/are aware of. Induction and acronym generation remain the tasks that best meet all of the above desiderata.
Some examples that we are less excited about include:
We would love for interested people to contribute ideas! Below are some resources we put together to make the search as easy as possible:
Notes