Full toy model for preference learning — LessWrong