Paul Christiano published his and Buck Shlegeris' implementation at It's the code behind the article Supervising strong learners by amplifying weak experts.

With William Saunders' permission, I published a version modified by him and later me: This one has changes and more documentation that allow you to run it almost out of the box.

New Comment

New to LessWrong?