x
Measuring artificial intelligence on human benchmarks is naive — LessWrong