Review of METR’s public evaluation protocol — LessWrong