x
Proposal: Using Monte Carlo tree search instead of RLHF for alignment research — LessWrong