A library for safety research in conditioning on RLHF tasks — LessWrong