epistemic status: Speculation. The actual proposals are idealized, not meant to be exactly right. We have thought about this for less than an hour.
In this earlier post I stated a speculative hypothesis about the algorithm that a single imitator that imitates collections of multiple humans would learn. Here Joar Skalse joined me and we made a list of some more hypotheses, all very speculative and probably each individually wrong.
The point is that if we have an imitator that imitates a single human’s text, we might (very dubiously) expect that imitator to learn basically a copy of that human. What would an imitator learn who is trained to imitate content generated by vast collections of humans? We can then ask: what are the implications for how it generalizes and what you can get with finetuning?
Here are our set of idealized and almost certainly not exactly correct hypotheses: