He put up a very good fight.
Eliezer: the rationality of defection in these finitely repeated games has come under some fire, and there's a HUGE literature on it. Reading some of the more prominent examples may help you sort out your position on it.
Robert Aumann. 1995. "Backward Induction and Common Knowledge of Rationality." Games and Economic Behavior 8:6-19.
Cristina Bicchieri. 1988. "Strategic Behavior and Counterfactuals." Synthese 76:135-169.
Cristina Bicchieri. 1989. "Self-Refuting Theories of Strategic Interaction: A Paradox of Common Knowledge." Erkenntnis 30:69-85.
Ken Binmore. 1987. "Modeling Rational Players I." Economics and Philosophy 3:9-55.
Jon Elster. 1993. "Some unresolved problems in the theory of rational behaviour." Acta Sociologica 36: 179-190.
Philip Reny. 1992. "Rationality in Extensive-Form Games." The Journal of Economic Perspectives 6:103-118.
Phillip Petit and Robert Sugden. 1989. "The Backward Induction Paradox." The Journal of Philosophy 86:169-182.
Brian Skyrms. 1998. "Subjunctive Conditionals and Revealed Preference." Philosophy of Science 65:545-574
Robert Stalnaker. 1999. "Knowledge, Belief and Counterfactual Reasoning in Games." in Cristina Bicchieri, Richard Jeffrey, and Brian Skyrms, eds., The Logic of Strategy. New York: Oxford University Press.
Fair enough, but consider the counterfactual case: suppose we believed that there were some fact about a person that would permit enslaving that person, but learned that the set of people to whom those facts applied was the null set. It seems like that would still represent moral progress in some sense.
Perhaps not the sort that Eliezer is talking about, though. But I'm not sure that the two can be cleanly separated. Consider slavery again, or the equality of humanity in general. Much of the moral movement there can be seen as changing interpretations of Christianity -- that is, people thought the Bible justified slavery, then they stopped thinking that. Is that a purely moral change? Or is that a better interpretation of a body of religious thought?
I don't think discovering better instrumental values toward the same terminal values you always had counts as moral progress, at least if those terminal values are consciously, explicitly held.
Why on earth not? Aristotle thought some people were naturally suited for slavery. We now know that's not true. Why isn't that moral progress?
(Similarly, general improvements in reasoning, to the extent they allow us to reject bad moral arguments as well as more testable kinds of bad arguments, could count as moral progress.)
One possibility: we can see a connection between morality and certain empirical facts -- for example, if we believe that more moral societies will be more stable, we might think that we can see moral progress in the form of changes that are brought about by previous morally related instability. That's not very clear -- but a much clearer and more sophisticated variant on that idea can perhaps be seen in an old paper by Joshua Cohen, "The Arc of the Moral Universe" (google scholar will get it, and definitely read it, because a) it's brilliant, and b) I'm not representing it very well).
Or we might think that some of our morally relevant behaviors are consistently dependent on empirical facts, in which we might progress in finding out. For example, we might have always thought that beings who are as intelligent as we are and have as complex social and emotional lives as do we deserve to be treated as equals. Suppose we think the above at year 1 and year 500, but at year 500, we discover that some group of entities X (which could include fellow humans, as with the slaves, or other species) is as intelligent, etc., and act accordingly. Then it seems like we've made clearly directional moral progress -- we've learned to more accurately make the empirical judgments about which our unchanged moral judgment depends.
So here's a question Eliezer: is Subhan's argument for moral skepticism just a concealed argument for universal skepticism? After all, there are possible minds that do math differently, that do logic differently, that evaluate evidence differently, that observe sense-data differently...
Either Subhan can distinguish his argument from an argument for universal skepticism, or I say that it's refuted by reductio, since universal skepticism fails to the complete impossibility of asserting it consistently + things like moorean facts.
Suppose that 98% of humans, under 98% of the extrapolated spread, would both choose a certain ordering of arguments, and also claim that this is the uniquely correct ordering. Is this sufficient to just go ahead and label that ordering the rational one? If you refuse to answer that question yourself, what is the procedure that answers it?
Again, this is why it's irreducibly social. If there isn't a procedure that yields a justified determinate answer to the rationality of that order, then the best we can do is take what is socially accepted at the time and in the society in which such a superintelligence is created. There's nowhere else to look.
Things like the ordering of arguments are just additional questions about the rationality criteria, and my point above applies to them just as well -- either there's a justifiable answer ("this is how arguments are to be ordered,") or it's going to be fundamentally socially determined and there's nothing to be done about it. The political is really deeply prior to the workings of a superintelligence in such cases: if there's no determinate correct answer to these process questions, then humans will have to collectively muddle through to get something to feed the superintelligence. (Aristotle was right when he said politics was the ruling science...)
On the humans for humans point, I'll appeal back to the notion of modeling minds. If we take P to be a reason, then all we have to be able to tell the superintelligence is "simulate us and consider what we take to be reasons," and, after simulating us, the superintelligence ought to know what those things are, what we mean when we say "take to be reasons," etc. Philosophy written by humans for humans ought to be sufficient once we specify the process by which reasons that matter to humans are to be taken into account.
Right, but those questions are responsive to reasons too. Here's where I embrace the recursion. Either we believe that ultimately the reasons stop -- that is, that after a sufficiently ideal process, all of the minds in the relevant mind design space agree on the values, or we don't. If we do, then the superintelligence should replicate that process. If we don't, then what basis do we have for asking a superintelligence to answer the question? We might as well flip a coin.
Of course, the content of the ideal process is tricky. I'm hiding the really hard questions in there, like what counts as rationality, what kinds of minds are in the relevant mind design space, etc. Those questions are extra-hard because we can't appeal to an ideal process to answer them on pain of circularity. (Again, political philosophy has been struggling with a version of this question for a very long time. And I do mean struggling -- it's one of the hardest questions there is.) And the best answer I can give is that there is no completely justifiable stopping point: at some point, we're going to have to declare "these are our axioms, and we're going with them," even though those axioms are not going to be justifiable within the system.
What this all comes down to is that it's all necessarily dependent on social context. The axioms of rationality and the decisions about what constitute relevant mind-space for any such superintelligence would be determined by the brute facts of what kind of reasoning is socially acceptable in the society that creates such a superintelligence. And that's the best we can do.
The resemblance between my second suggestion and your thing didn't go unnoticed -- I had in fact read your coherent extrapolated volition thing before (there's probably an old e-mail from me to you about it, in fact). I think it's basically correct. But the method of justification is importantly different, because the idea is that we're trying to approximate something with epistemic content -- we're not just trying to do what you might call a Xannon thing -- we're not just trying to model what humans would do. Rather, we're trying to model and improve a specific feature of humanity that we see as morally relevant -- responsiveness to reasons.
That's really, really important.
In the context of your dialogue above, it's what reconciles Xannon and Yancy: even if Yancy can't convince Xannon that there's some kind of non-subjective moral truth, he ought to be able to convince Xannon that moral beliefs should be responsive to reasons -- and likewise, even if Xannon can't convince Yancy that what really matters, morally, is what people can agree on, he should be able to convince Yancy that the best way to get at it in the real world is by a collective process of reasoning.
So you see that this method of justification does provide a way to answers to questions like "friendliness to whom." I know what I'm doing, Eliezer. :-)