"model scores" is a questionable concept — LessWrong