Introducing the WeirdML Benchmark — LessWrong