Consider trying Vivek Hebbar's alignment exercises — LessWrong