x
Reproducing ARC Evals' recent report on language model agents — LessWrong