Self-Adapting Language Models (from MIT, arXiv preprint)

I am not affiliated with the authors, mainly posting this to get some technical commentary on it. Full arXiv paper here.

Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. Given a new input, the model produces a self-edit—a generation that may restructure the information in different ways, specify optimization hyperparameters, or invoke tools for data augmentation and gradient-based updates. Through supervised finetuning (SFT), these self-edits result in persistent weight updates, enabling lasting adaptation. To train the model to produce effective self-edits, we use a reinforcement learning loop, using the downstream performance of the updated model as the reward signal. Unlike prior approaches that rely on separate adaptation modules or auxiliary networks, SEAL directly uses the model’s generation to parameterize and control its own adaptation process. Experiments on knowledge incorporation and fewshot generalization show that SEAL is a promising step toward language models capable of self-directed adaptation in response to new data. Our website and code is available at https://jyopari.github.io/posts/seal.

They use Llama 3.2 1B - Instruct and claim massive improvements on self-edit policy for selected ARC 1 tasks (20% to 72% jump, with a 100% upper bound for hand-crafted self-edit solutions). I see the paper as a clear demonstration of a LLM selecting its own weight updates to better answer questions and I feel the implications are massive (if it can be scaled up in the big labs). I don't have the technical experience however for a full dive.

[-]Expertium7mo1-4

This might be the first time that training on synthetic data has proven to be useful? Can't say I've seen a lot of papers about training on synthetic data, but from what I've heard, it basically doesn't work. They admit that this approach still suffers from catastrophic forgetting, and it's also computationally expensive. Still, if this works at all, it's a good sign that using synthetic data is not a doomed approach.
This isn't really self-improvement, more like self...refinement? It's akin to a human learning not only by reading a book, but also by writing down a list of key points from the book and learning from those as well. You can't make LLMs learn truly novel stuff that way, but it can be useful for making them better at learning obscure info that is scarce in the training data.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

5

Self-Adapting Language Models (from MIT, arXiv preprint)

5

5