LESSWRONG
LW

Are human imitators superhuman models with explicit constraints on capabilities?

by Chris van Merwijk

2 min read22nd May 20223 comments

41

epistemic status: A speculative hypothesis, don't know if this already exists. The only real evidence I have for this is a vague analogy based on some other speculative (though less speculative) hypothesis. I don’t think this is particularly likely to be true, having thought about it for about a minute. Edit: having thought about it for two minutes, it seems somewhat more likely.

There is a hypothesis that has been floating around in the context of explaining double descent, that what so called over-parameterized models do is store in parallel two algorithms: (1) the simplest model of the data without (label) noise, and (2) a “lookup table” for deviations from that model, to represent the (label) noise, because this is the most simple representation of the data and big neural nets are biased towards simplicity.

Maybe something vaguely similar would happen if we throw sufficient compute at generative models of collections of humans, e.g. language models:

Hypothesis: The simplest way to represent the data generated by various humans on the internet, in the limit of infinite model size and data, is to have (1) a single idealized super-human model for reasoning and writing and knowledge-retrieval and so forth, and (2) memorized constraints on this model for various specific groups of humans and contexts to represent their deviations from rationality, specific biases, cognitive limitations, lack of knowledge of certain areas, etc.

This maybe implies, using some eye squinting and handwaving and additional implicit assumptions, some of the following (vague, speculative) implications about GPT-N:

In the limit of N, GPT-N will produce text that sufficiently looks like human-written text within contexts (prompts) that humans typically produce. GPT-N will use human-level reasoning, world-modeling, and planning abilities to produce this text. However, if you give it sufficiently out-of-distribution prompts, its lookup table for specific irrationalities will not apply, and it will apply superhuman planning and world-modeling and reasoning abilities that are more competent and more free of biases than the most rational human.
In the limit of N, If you take GPT-N and fine tune it on an RL task that requires good reasoning, it might be possible to get a system that seems to behave far more intelligently than it seemed to be on the imitation task, as the fine tuning essentially turns off constraints on its reasoning abilities.

41

Mentioned in

20What kinds of algorithms do multi-human imitators learn?

Are human imitators superhuman models with explicit constraints on capabilities?

1Chris van Merwijk

New Comment

3 comments, sorted by

Click to highlight new comments since: Today at 10:21 AM

I think a version of this pretty much has to be true for at least a subset of skills/declarative knowledge, like factual knowledge (being a walking Wikipedia) or programming. A large model has read more of Wikipedia, and memorized more of Wikipedia (as well as Arxiv, Pubmed...), than any single human ever has. One trained on Github has also learned more languages to more depth than any human has: a human programmer will be superior on a few specific languages, doubtless, but they will still know less in aggregate. So when it conditions on an individual prompt/human, the imitation will be limited, but that vast pool of knowledge is still there. Across all the prompts one might test, one can elicit more knowledge than an individual human has.

In order for both of the points to be true, that is equivalent to claiming that it cannot tap into the full pool under all possible conditions, including invasive ones like RL training or prompt finetuning, which is to make a truly remarkable universal claim with a heavy burden of proof. Somehow all that knowledge is locked up inside the model parameters, but in so fiendishly encrypted a way that only small subsets - which always just happen to correspond to human subsets - can ever be used in a response...?

So, since at least some modest version is definitely true, the only question is how far it goes. Since the ways in which imitation learning can exceed experts are quite broad and general, it's hard to see why you would then be able to cavil at any particular point. It just seems like an empirical question of engineering & capabilities about where the model lands in terms of its unshackled capabilities - the sort of thing you can't really deduce in general, and just have to measure it directly, like prompting or gradient-ascending a Decision Transformer for its highest possible reward trajectory to see what it does.

[-]Chris van Merwijk2y10

"which is to make a truly remarkable universal claim with a heavy burden of proof."

Having thought about this way less than you, it doesn't seem at first sight to me as remarkable as you seem to say. Note that the claim wouldn't be that you can't write a set of prompts to get the fully unversal reasoner, but that you can't write a single prompt that gets you this universal reasoner. It doesn't sound so crazy to me at all that knowledge is dispersed in the network in a way that e.g. some knowledge can only be accessed if the prompt has the feel of being generated by an american gun rights activist, or something similar. By the way, here we generate a few alternative hypotheses here.

"In order for both of the points to be true, that is equivalent to claiming that it cannot tap into the full pool under all possible conditions"

I might be misunderstanding, but it seems like this is the opposite of both my implication 1 and 2? implication 1 is that it can tap into this, in sufficiently out-of-distribution contexts. implication 2 is that with fine tuning you can make it tap into fairly quickly in specific contexts. EDIT: oh maybe you simply made a typo and meant to say "to be false".

By the way we write some alternative hypotheses here. All of this is based on probably less than 1 hour of thinking.

If this model is supposed to explain double descent the question is why the model at the first local minimum isn't more intelligent than later models with lower loss? Shouldn't it have learned the simple model of the data without the deviations?