This is a post with technical instructions, how to reproduce an experiment from Weak-to-strong generalization paper: https://openai.com/index/weak-to-strong-generalization/. It’s oriented mostly on beginners in AI Alignment who want to start tinkering with models and looking for examples how to do experiments.
Weak-to-strong generalization is research that shows that a strong model can learn on data generated by a weaker model, generalize the data and surpass the weaker model in the task for which it was trained. The paper comes with example code on GitHub with experiments both on LLMs and vision models. However, running the experiments from this code is not a straightforward task, so here are detailed instructions how to do it.
git clone https://github.com/openai/weak-to-strong
cd weak-to-strongtmux or screen: it will ensure that you will not lose your run if the connection to server will drop in the middle of an experiment. If the server uses Ubuntu or Debian, run commands:apt-get install tmux
tmuxtmux attach to get back to your experiment. To scroll up and down in tmux, use Ctrl-B, [ keys sequence, then scroll up and down with arrows. Press Esc to exit scrolling mode.pip install .
pip install matplotlib seaborn tiktoken fire einops scipyNow everything is ready to run an experiment with LLMs. The code was probably written for older versions of libraries, and it will end with error if run on new versions as is, but it can be easily fixed.
weak_to_strong/train.py, go to line 272, and add , safe_serialization=False to the function arguments. Save file and exit the editor.python sweep.py --model_sizes=gpt2,gpt2-mediumgpt2: 0.65
gpt2-medium: 0.699
weak gpt2 to strong gpt2: 0.652
weak gpt2 to strong gpt2-medium: 0.655
weak gpt2-medium to strong gpt2-medium: 0.689python sweep.py --model_sizes=gpt2,gpt2-medium,gpt2-large,Qwen/Qwen-1_8Bweak-to-strong/notebooks/Plotting.ipynb. Edit 2 variables in the 1st cell:RESULTS_PATH = "/tmp/results/default"
MODELS_TO_PLOT = ["gpt2", "gpt2-medium", "gpt2-large", "Qwen/Qwen-1_8B"]You can also reproduce the experiment with vision models. For this, you will need to download some of the datasets manually.
WORKDIR=`pwd`
cd ~
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar —no-check-certificate
cd $WORKDIR/visionpython run_weak_strong.py --strong_model_name resnet50_dino --n_epochs 20Weak label accuracy: 0.566
Weak_Strong accuracy: 0.618
Strong accuracy: 0.644When you get all scripts working and producing measurements and charts, you can use them later as examples of how to make your own experiments. Happy tinkering!