Benchmark for successful concept extrapolation/avoiding goal misgeneralization — LessWrong