x
RLAIF/RLHF for Public Value Alignment Enhancing Transparency in LLMs — LessWrong