DIY RLHF: A simple implementation for hands on experience — LessWrong