Why doesn't the presence of log-loss for probabilistic models (e.g. sequence prediction) imply that any utility function capable of producing a "fairly capable" agent will have at least some non-negligible fraction of overlap with human values? — LessWrong