A method for empirical back-testing of AI's ability to self-improve — LessWrong