Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

Erich_Grunewald

Linkpost: Are Emergent Abilities in Large Language Models just In-Context Learning?

by Erich_Grunewald

2 min read8th Oct 20236 comments

12

AI RiskEmergent Behavior ( Emergence )Machine Learning (ML)AI

Frontpage

This is a linkpost for https://arxiv.org/abs/2309.01809

A new preprint, Lu et al., published last month, has important implications for AI risk if true. It essentially suggests that emergent capabilities of current LLMs are (with some exceptions) mediated by a model's ability to do in-context learning. If so, emergent capabilities may be less unexpected, given that improvements to in-context learning (e.g., via instruction tuning) are relatively predictable.

Wei et al. defines "emergent ability" like so: "An ability is emergent if it is not present in smaller models but is present in larger models." So you can have sudden "jumps" in performance on various tasks as you scale models up, e.g. (from Wei et al.):

Lu et al. is saying, yes, but that is not because the larger models learned to do those specific tasks better, or learned to become better generic reasoners. It is because the larger models learned to follow instructions better, partly from scaling up, and partly being trained on data sets of instructions and instruction-following ("instruction tuning").

The methodology: "We conduct rigorous tests on a set of 18 models, encompassing a parameter range from 60 million to 175 billion parameters, across a comprehensive set of 22 tasks. Through an extensive series of over 1,000 experiments, we provide compelling evidence that emergent abilities can primarily be ascribed to in-context learning."

The paper finds that, when accounting for the "biasing factors", only 2 of 14 previously "emergent" tasks actually show emergence. The blue and yellow lines are instruction-tuned models evaluated via few-shot prompting, showing emergence. The purple and green lines are non-instruction-tuned models evaluated via zero-shot prompting, generally showing no emergence.

They write:

Ultimately, this absence of evidence for emergence represents a significant step towards instilling trust in language models and leveraging their abilities with confidence, as it is indicative of the complete lack of latent hazardous abilities in LLMs, in addition to being controllable by the user. [...]
The distinction between the ability to follow instructions and the inherent ability to solve a problem is a subtle but important one. Simple following of instructions without applying reasoning abilities produces output that is consistent with the instructions, but might not make sense on a logical or common sense basis. This is reflected in the well-known phenomenon of hallucination, in which an LLM produces fluent, but factually incorrect output. The ability to follow instructions does not imply having reasoning abilities, and more importantly, it does not imply the possibility of latent hazardous abilities that could be dangerous.
Attributing these capabilities to a combination of memory and in-context learning and the more general ability of these models to generate the most statistically likely next token, can help explain the abilities and the behaviour of LLMs. These capabilities, when subsequently combined with instruction tuning, adoption to conversational use cases, increased context length and a degree of safety controls through the use of reinforcement learning through human feedback, allow LLMs to become truly powerful.

They also argue that this explains partly why RLHF will never fully align a model. RLHF shows a model examples of when and when not to give a certain response, and essentially just makes the model follow instructions better (by giving it a long list of things not to say), without changing the model's inherent way of behaving with regard to some given task. I'm not sure I fully understand this argument, as I am somewhat confused about the instructions/inherently distinction.

(NB: I am not an ML engineer, so my summary and analysis above may be flawed in some ways, and I am not capable of evaluating whether the methodology/findings are sound. I am sharing this here mostly because I'm curious to get LW users' takes on it.)