x
An alternative of PPO towards alignment — LessWrong