Maximal lotteries for value learning — LessWrong