I did two small experiments on the GPT-2 small model. First experiment: can GPT-2-small answer sentiment analysis questions? (It can't.) Second experiment: When GPT-2 writes continuations of Howl, is it picking up the "Moloch in X!" template from its priming, or from a copy of Howl in its original training set? (It's from the training set.)
Sentiment analysis experiment:
I downloaded the MPQA Subjectivity Lexicon, which is a dictionary in which words are marked as positive or negative. For example hopelessness=>negative, humour=>positive, grace=>positive, corruption=>negative. I primed GPT-2 with a list of 20 questions like "Is a <noun> good? Yes. Is a <noun> good? No." followed by an unanswered question of the same form, and had it continue for one more word. In its priming, half the answers were yes and the other half were no. It answered "No" 37/40 times, and neither its answers nor its yes answers were better than chance.
When given some lines from Ginsberg's Howl as priming, it writes a good continuation (similar to the one Chelsea Voss and Qiaochu Yuan got from it). In particular, it uses the "Moloch in X!" template repeatedly.
If I take its continuation of Howl and feed it back in as a prompt, I get more Howl (Moloch in X!). If I take Howl and replace "Moloch" with "Lomoch", I get more Howl. But if I take its continuation of Howl from the first step and replace Moloch with Lomoch *there*, I get unrelated text which does not use the "Moloch in X!" template.
So, it isn't inferring the template from its priming; rather, it learned the template from its training set (which probably included Howl), and it produces Howl-like text iff it's given a cue strong enough to remind it of the source.