This is a post with technical instructions, how to reproduce an experiment from Weak-to-strong generalization paper: https://openai.com/index/weak-to-strong-generalization/. It’s oriented mostly on beginners in AI Alignment who want to start tinkering with models and looking for examples how to do experiments.
Weak-to-strong generalization is research that shows that a strong model can learn on data generated by a weaker model, generalize the data and surpass the weaker model in the task for which it was trained. The paper comes with example code on GitHub with experiments both on LLMs and vision models. However, running the experiments from this code is not a straightforward task, so here are detailed instructions how to do it.
git clone https://github.com/openai/weak-to-strong
cd weak-to-strong
tmux
or screen
: it will ensure that you will not lose your run if the connection to server will drop in the middle of an experiment. If the server uses Ubuntu or Debian, run commands:apt-get install tmux
tmux
tmux attach
to get back to your experiment. To scroll up and down in tmux
, use Ctrl-B, [
keys sequence, then scroll up and down with arrows. Press Esc
to exit scrolling mode.pip install .
pip install matplotlib seaborn tiktoken fire einops scipy
Now everything is ready to run an experiment with LLMs. The code was probably written for older versions of libraries, and it will end with error if run on new versions as is, but it can be easily fixed.
weak_to_strong/train.py
, go to line 272, and add , safe_serialization=False
to the function arguments. Save file and exit the editor.python sweep.py --model_sizes=gpt2,gpt2-medium
gpt2: 0.65
gpt2-medium: 0.699
weak gpt2 to strong gpt2: 0.652
weak gpt2 to strong gpt2-medium: 0.655
weak gpt2-medium to strong gpt2-medium: 0.689
python sweep.py --model_sizes=gpt2,gpt2-medium,gpt2-large,Qwen/Qwen-1_8B
weak-to-strong/notebooks/Plotting.ipynb
. Edit 2 variables in the 1st cell:RESULTS_PATH = "/tmp/results/default"
MODELS_TO_PLOT = ["gpt2", "gpt2-medium", "gpt2-large", "Qwen/Qwen-1_8B"]
You can also reproduce the experiment with vision models. For this, you will need to download some of the datasets manually.
WORKDIR=`pwd`
cd ~
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar —no-check-certificate
cd $WORKDIR/vision
python run_weak_strong.py --strong_model_name resnet50_dino --n_epochs 20
Weak label accuracy: 0.566
Weak_Strong accuracy: 0.618
Strong accuracy: 0.644
When you get all scripts working and producing measurements and charts, you can use them later as examples of how to make your own experiments. Happy tinkering!