Epistemological Status: I did almost no literature review and could very well be reinventing the wheel here. This does seem to give an alternative prompt strategy (permute things) for Language Models.

I'm going to ignore the specific details of GPT and focus on the idea of predicting an output token om+1 given the history om,…,om−n up to some horizon n. This is an n-gram Markov model. Technically a language model predicts the unconditional probability of a sequence, but in practice, we usually do the above.

I'm interested in formalizing a particular prompt format for GPT. The class I'm interested in are structured tasks:

S={si}i∈I={(qi,ai)}i∈I

Define a recurrent S-task as one where the tasks share cores,

μ=ai∩aj,∀i,j∈Iλ=qi∩qj,∀i,j∈I

We can represent this pictorially,

Recurrent S-tasks are an example of a sunflower. An example would be prompted addition. We have,

μ=What is _ plus _?λ=∅

The main property a recurrent S-task enjoys is probability invariance under permutation,

p(an+1|qn+1,sn,…,s1)=p(an+1|qn+1,sσ(n),…,sσ(1))

I believe this setup is useful for getting more precise about how the language model operates. Consider an accumulation task,

μ=Add _ to the totalλ=The total is _

Is it a recurrent task? Well, technically yes. However, it isn't feasible because we don't know what the variable total is. However, consider the modification,

μ={The total is _,Add _ to the total}λ=The total is _

Now, we have a feasible recurrent task. When we design prompts using the original we write out a0q1a1q2a2…qnan. What we're showing is that the invariance is not at the qiai level, not feasible, instead, it's at the aiqiai+1 level. So we should apply the substitution rule aiqi→qi. In general, the advantage of using recurrent tasks would be that they would be invariant under permutation which would serve as a form of regularization on the output.

Epistemological Status: I did almost no literature review and could very well be reinventing the wheel here. This does seem to give an alternative prompt strategy (permute things) for Language Models.I'm going to ignore the specific details of GPT and focus on the idea of predicting an output token om+1 given the history om,…,om−n up to some horizon n. This is an n-gram Markov model. Technically a language model predicts the unconditional probability of a sequence, but in practice, we usually do the above.

I'm interested in formalizing a particular prompt format for GPT. The class I'm interested in are structured tasks:

S={si}i∈I={(qi,ai)}i∈I

Define a

recurrent S-taskas one where the tasks share cores,μ=ai∩aj, ∀i,j∈I λ=qi∩qj, ∀i,j∈I

We can represent this pictorially,

Recurrent S-tasks are an example of a sunflower. An example would be prompted addition. We have,

μ=What is _ plus _? λ=∅

The main property a recurrent S-task enjoys is probability invariance under permutation,

p(an+1|qn+1,sn,…,s1)=p(an+1|qn+1,sσ(n),…,sσ(1))

I believe this setup is useful for getting more precise about how the language model operates. Consider an accumulation task,

μ=Add _ to the total λ=The total is _

Is it a recurrent task? Well, technically yes. However, it isn't feasible because we don't know what the variable total is. However, consider the modification,

μ={The total is _, Add _ to the total} λ=The total is _

Now, we have a feasible recurrent task. When we design prompts using the original we write out a0q1a1q2a2…qnan. What we're showing is that the invariance is not at the qiai level, not feasible, instead, it's at the aiqiai+1 level. So we should apply the substitution rule aiqi→qi. In general, the advantage of using recurrent tasks would be that they would be invariant under permutation which would serve as a form of regularization on the output.