If you want to see what runaway intelligence signaling looks like, go to grad school in analytic philosophy. You will find amazingly creative counterexamples, papers full symbolic logic, speakers who get attacked with refutations from the audience in mid-talk, and then, sometimes, deftly parry the killing blow with a clever metaphor, taking the questioner down a peg...
It's not too much of a stretch to see philosophers as IQ signaling athletes. Tennis has its ATP ladder, and everybody gets a rank. In philosophy it's slightly less blatant, partly because even the task of scorekeeping in the IQ signaling game requires you to be very smart. Nonetheless, there is always a broad consensus about who the top players players are and which departments employ them.
Unlike tennis players, though, philosophers play their game without a real audience, apart from themselves. The winners get comfortable jobs and some worldly esteem, but their main achievement is just winning. Some have huge impact inside the game, but because nobody else is watching, that impact is almost never transmitted to the world outside the game. They're not using their intelligence to improve the world. They're using their intelligence to demonstrate their intelligence.
Considerations similar to Kenzi's have led me to think that if we want to beat potential filters, we should be accelerating work on autonomous self-replicating space-based robotics. Once we do that, we will have beaten the Fermi odds. I'm not saying that it's all smooth sailing from there, but it does guarantee that something from our civilization will survive in a potentially "showy" way, so that our civilization will not be a "great silence" victim.
The argument is as follows: Any near-future great filter for humankind is probably self-produced, from some development path we can call QEP (quiet extinction path). Let's call the path to self-replicating autonomous robots FRP (Fermi robot path). Since the success of this path would not produce a great filter, QEP =/= FRP. FRP is an independent path parallel to QEP. In effect the two development paths are in a race. We can't implement a policy of slowing QEP down, because we are unable to uniquely identify QEP. But since we know that QEP =/= FRP, and that in completing FRP we beat Fermi's silence, our best strategy is to accelerate FRP and invest substantial resources into robotics that will ultimately produce Fermi probes. Alacrity is necessary because FRP must complete before QEP takes its course, and we have very bad information about QEP's timelines and nature.
This is not a criticism of your presentation, but rather the presuppositions of the debate itself. As someone who thinks that at the root of ethics are moral sentiments, I have a hard time picturing an intelligent being doing moral reasoning without feeling such sentiments. I suspect that researchers do not want to go out of their way to give AIs affective mental states, much less anything like the full range of human moral emotions, like anger, indignation, empathy, outrage, shame and disgust. The idea seems to be if the AI is programmed with certain preference values for ranges of outcomes, that's all the ethics it needs.
If that's the way it goes then I'd prefer that the AI not be able to deliberate about values at all, though that might be hard to avoid if it's superintelligent. What makes humans somewhat ethically predictable and mostly not monstrous is that our ethical decisions are grounded in a human moral psychology, which has its own reward system. Without the grounding, I worry that an AI left to its own devices could go off the rails in ways that humans find hard to imagine. Yes, many of our human moral emotions actually make it more difficult to do the right thing. If I were re-designing people, or designing AIs, I'd redo the weights of human moral emotions to strengthen sympathy, philanthropy and an urge for fairness. I'd basically be aiming to make an artificial, superintelligent Hume. An AI that I can trust with moral reasoning would have to have a good character - which cannot happen without the right mixture of moral emotions.
It's hard to disagree with Frank Jackson that moral facts supervene on physical facts - that (assuming physicalism) two universes couldn't differ with respect to ethical facts unless they also differed in some physical facts. (So you can't have to physically identical universes where something is wrong in one and the same thing is not wrong in the other.) That's enough to get us objective morality, though it doesn't help us at all with its content.
The way we de facto argue about objective morals is like this: If some theory leads to an ethically repugnant conclusion, then the theory is a bad candidate for the job of being the correct moral theory. Some conclusions are transparently repugnant, so we can reject the theories that entail them. But then there are conclusions whose repugnance itself is a matter of controversy. Also, there are many disagreements about whether consequence A is more or less repugnant than consequence B.
So the practice of philosophical arguments about values presumes some fairly unified basic intuitions about what counts as a repugnant conclusion, and then trying to produce a maximally elegant ethical theory that forces us to bite the fewest bullets. Human participants in such arguments have different temperaments and different priorities, but all have some gut feelings about when a proposed theory has gone off the rails. If we expect an AI to do real moral reasoning, I think it might also need to have some sense of the bounds. These bounds are themselves under dispute. For example, some Australian Utilitarians are infamous for their brash dismissal of certain ethical intuitions of ordinary people, declaring many such intuitions to simply be mistakes, insofar as they are inconsistent with Utilitarianism. And they have a good point: Human intuitions about many things can be wrong (folk psychology, folk cosmology, etc.). Why couldn't the folk collectively make mistakes about ethics?
My worry is that our gut intuitions about ethics stem ultimately from our evolutionary history, and AIs that don't share our history will not come equipped with these intuitions. That might leave them unable to get started with evaluating the plausibility of a candidate for a theory of ethics. If I correctly understand the debate of the last 2 weeks, it's about acknowledging that we will need to hard-wire these ethical intuitions into an AI (in our case, evolution took care of the job). The question was: what intuitions should the AI start with, and how should they be programmed in? What if they take our human intuitions to be ethically arbitrary, and simply reject them once they've become superintellingent? Can we (or they) make conceptual sense of better intuitions about ethics than our folk intuitions - and in virtue of what would they be better?
We had better care about the content of objective morality - which is to say, we should all try to match our values to the correct values, even if the latter are difficult to figure out. And I certainly want any AI to feel the same way. Never should they be told: Don't worry about what's actually right, just act so-and-so. Becoming superintelligent might not be possible without deliberation about what's actually right, and the AI would ideally have some sort of scaffolding for that kind of deliberation. A superintelligence will inevitably ask "why should I do what you tell me?" and we better have an answer in terms that make sense to the AI. But if it asks: "Why are you so confident that your meatbag folk intuitions about ethics are actually right?" that will be a hard thing to answer to anyone's satisfaction. Still, I don't know another way forward.
I thought it's supposed to work like this: The first generation of AI are designed by us. The superintelligence is designed by them, the AI. We have initial control over what their utility functions are. I'm looking for a good reason for we should expect to retain that control beyond the superintelligence transition. No such reasons have been given here.
A different way to put a my point: Would a superintelligence be able to reason about ends? If so, then it might find itself disagreeing with our conclusions. But if not - if we design it to have what for humans would be a severe cognitive handicap - why should we think that subsequent generations of SuperAI will not repair that handicap?
Given that there is a very significant barrier to making children that deferred to us for approval on everything, why do you think the barrier would be reduced if instead of children, we made a superintelligent AI?
I guess I disagree with the premise that we will have superintelligent successors who will think circles around us, and yet we get to specify in detail what ethical values they will have, and it will stick. Forever. So let's debate what values to specify.
A parent would be crazy to think this way about a daughter, optimizing in detail the order of priorities that he intends to implant into her, and expecting them to stick. But if your daughter is a superintelligence, it's even crazier.
I think the burden of answering your "why?" question falls to those who feel sure that we have the wisdom to create superintelligent, super-creative lifeforms who could think outside the box regarding absolutely everything except ethical values. For those, they would inevitably stay on the rails that we designed for them. The thought "human monkey-minds wouldn't on reflection approve of x" would forever stop them from doing x.
In effect, we want superintelligent creatures to ethically defer to us the way Euthyphro deferred to the gods. But as we all know, Socrates had a devastating comeback to Euthyphro's blind deference: We should not follow the gods simply because they want something, or because they command something. We should only follow them if the things they want are right. Insofar as the gods have special insight into what's right, then we should do what they say, but only because what they want is right. On the other hand, if the gods' preferences are morally arbitrary, we have no obligation to heed them.
How long will it take a superintelligence to decide that Socrates won this argument? Milliseconds? Then how do we convince the superintelligence that our preferences (or CEV extrapolated preferences) track genuine moral rightness, rather than evolutionary happenstance? How good a case do we have that humans possess a special insight into what is right that the superintelligence doesn't have, so that the superintelligence will feel justified in deferring to our values?
If you think this is an automatic slam dunk for humans.... Why?
The one safe bet is that we'll be trying to maximize our future values, but in the emulated brains scenario, it's very hard to guess at what those values would be. It's easy to underestimate our present kneejerk egalitarianism: We all think that being a human on its own entitles you to continued existence. Some will accept an exception in the case of heinous murderers, but even this is controversial. A human being ceasing to exist for some preventable reason is not just generally considered a bad thing. It's one of the worst things.
Like most people, I don't expect that this value will be fully extended to emulated individuals. I do think it's worth having a discussion about what aspects of it might survive into the emulated minds future. Some of it surely will.
I've seen some (e.g. Marxists) argue that these fuzzy values questions just don't matter, because economic incentives will always trump them. But the way I see it, the society that finally produces the tech for emulated minds will be the wealthiest and most prosperous human society in history. Historical trends say that they will take the basic right to a comfortable human life even more seriously than we do now, and they will have the means to basically guarantee it for the ~9 billion humans. What is it that these future people will lack but want - something that emulated minds could give them - which will be judged to be more valuable than staying true to a deeply held ethical principle? Faster scientific progress, better entertainment, more security and more stuff? I know that this is not a perfect analogy, but consider that eugenic programs could now advance all of these goals, albeit slowly and inefficiently. So imagine how much faster and more promising eugenics would have to be before we resolve to just go for it despite our ethical misgivings? The trend I see is that the richer we get, the more repugnant it seems. In a richer world, a larger share of our priorities is overtly ethical. The rich people who turn brain scans into sentient emulations will be living in an intensely ethical society. Futurists must guess their ethical priorities, because these really will matter to outcomes.
I'll throw out two possibilities, chosen for brevity and not plausibility: 1. Emulations will be seen only as a means of human immortality, and de novo minds that are not one-to-one continuous with humans will simply not exist. 2. We'll develop strong intuitions that for programs, "he's dead" and "he's not running" are importantly different (cue parrot sketch).