I put "Friendliness" in quotes in the title, because I think what we really want, and what MIRI seems to be working towards, is closer to "optimality": create an AI that minimizes the expected amount of astronomical waste. In what follows I will continue to use "Friendly AI" to denote such an AI since that's the established convention.
I've often stated my objections MIRI's plan to build an FAI directly (instead of after human intelligence has been substantially enhanced). But it's not because, as some have suggested while criticizing MIRI's FAI work, that we can't foresee what problems need to be solved. I think it's because we can largely foresee what kinds of problems need to be solved to build an FAI, but they all look superhumanly difficult, either due to their inherent difficulty, or the lack of opportunity for "trial and error", or both.
When people say they don't know what problems need to be solved, they may be mostly talking about "AI safety" rather than "Friendly AI". If you think in terms of "AI safety" (i.e., making sure some particular AI doesn't cause a disaster) then that does looks like a problem that depends on what kind of AI people will build. "Friendly AI" on the other hand is really a very different problem, where we're trying to figure out what kind of AI to build in order to minimize astronomical waste. I suspect this may explain the apparent disagreement, but I'm not sure. I'm hoping that explaining my own position more clearly will help figure out whether there is a real disagreement, and what's causing it.
The basic issue I see is that there is a large number of serious philosophical problems facing an AI that is meant to take over the universe in order to minimize astronomical waste. The AI needs a full solution to moral philosophy to know which configurations of particles/fields (or perhaps which dynamical processes) are most valuable and which are not. Moral philosophy in turn seems to have dependencies on the philosophy of mind, consciousness, metaphysics, aesthetics, and other areas. The FAI also needs solutions to many problems in decision theory, epistemology, and the philosophy of mathematics, in order to not be stuck with making wrong or suboptimal decisions for eternity. These essentially cover all the major areas of philosophy.
For an FAI builder, there are three ways to deal with the presence of these open philosophical problems, as far as I can see. (There may be other ways for the future to turns out well without the AI builders making any special effort, for example if being philosophical is just a natural attractor for any superintelligence, but I don't see any way to be confident of this ahead of time.) I'll name them for convenient reference, but keep in mind that an actual design may use a mixture of approaches.
- Normative AI - Solve all of the philosophical problems ahead of time, and code the solutions into the AI.
- Black-Box Metaphilosophical AI - Program the AI to use the minds of one or more human philosophers as a black box to help it solve philosophical problems, without the AI builders understanding what "doing philosophy" actually is.
- White-Box Metaphilosophical AI - Understand the nature of philosophy well enough to specify "doing philosophy" as an algorithm and code it into the AI.
The problem with Normative AI, besides the obvious inherent difficulty (as evidenced by the slow progress of human philosophers after decades, sometimes centuries of work), is that it requires us to anticipate all of the philosophical problems the AI might encounter in the future, from now until the end of the universe. We can certainly foresee some of these, like the problems associated with agents being copyable, or the AI radically changing its ontology of the world, but what might we be missing?
Black-Box Metaphilosophical AI is also risky, because it's hard to test/debug something that you don't understand. Besides that general concern, designs in this category (such as Paul Christiano's take on indirect normativity) seem to require that the AI achieve superhuman levels of optimizing power before being able to solve its philosophical problems, which seems to mean that a) there's no way to test them in a safe manner, and b) it's unclear why such an AI won't cause disaster in the time period before it achieves philosophical competence.
White-Box Metaphilosophical AI may be the most promising approach. There is no strong empirical evidence that solving metaphilosophy is superhumanly difficult, simply because not many people have attempted to solve it. But I don't think that a reasonable prior combined with what evidence we do have (i.e., absence of visible progress or clear hints as to how to proceed) gives much hope for optimism either.
To recap, I think we can largely already see what kinds of problems must be solved in order to build a superintelligent AI that will minimize astronomical waste while colonizing the universe, and it looks like they probably can't be solved correctly with high confidence until humans become significantly smarter than we are now. I think I understand why some people disagree with me (e.g., Eliezer thinks these problems just aren't that hard, relative to his abilities), but I'm not sure why some others say that we don't yet know what the problems will be.
The difficulty is still largely due to the security problem. Without catastrophic risks (including UFAI and value drift), we could take as much time as necessary and/or go with making people smarter first.
The aspect of FAI that is supposed to solve the security problem is optimization power aimed at correct goals. Optimization power addresses the "external" threats (and ensures progress), and correctness of goals represents "internal" safety. If an AI has sufficient optimization power, the (external) security problem is taken care of, even if the goals are given by a complicated definition that the AI is unable to evaluate at the beginning: it'll protect the original definition even without knowing what it evaluates to, and aim to evaluate it (for instrumental reasons).
This suggests that a minimal solution is to pack all the remaining difficulties in AI's goal definition, at which point the only object level problems are to figure out what a sufficiently general notion of "goal" is (decision theory; the aim of this part is to give the goal definition sufficient expressive power, to avoid constraining its decisions while extracting the optimization part... (read more)
Astronomical waste is a very specific concept arising from a total utilitarian theory of ethics. That this is "what we really want" seems highly unobvious to me; as someone who leans towards negative utilitarianism, I would personally reject it.
I'd be happy with an AI that makes people on Earth better off without eating the rest of the universe, and gives us the option to eat the universe later if we want to...
If the AI doesn't take over the universe first, how will it prevent Malthusian uploads, burning of the cosmic commons, private hell simulations, and such?
Of course, this is still just a proxy measure... say that we're "in a simulation", or that there are already superintelligences in our environment who won't let us eat the stars, or something like that—we still want to get as good a bargaining position as we possibly can, or to coordinate with the watchers as well as we possibly can, or in a more fundamental sense we want to not waste any of our potential, which I think is the real driving intuition here. (Further... (read more)
CFAI is deprecated for a reason, I can't read it either.
So after giving this issue some thought: I'm not sure ... (read more)
Do you have thoughts on the other approaches described here? It seems to me that black box metaphilosophical AI, in your taxonomy, need not be untestable nor dangerous during a transient period.
Does a sped-up uploaded mind count as a kind of black-box metaphysical AI?
On the other hand, to the extent that our uncertainty about whether different BBMAI designs do philosophy correctly is independent, we can build multiple ones and see what outputs they agree on. (Or a design could do this internally, achieving the same effect.)
This seems to be an argument for building a hybrid of what you call... (read more)
I prefer the more cheerfully phrased "Converts the reachable universe to QALYs" but same essential principle.
Perhaps you could make a taxonomy like yours when talking about a formally-defined singleton, which we might expect society to develop eventually. But I haven't seen strong arguments that we would need to design such a singleton starting from anything like our current state of knowledge. The best argument reason I know that we might need to solve this problem soon is the possibility of a fast takeoff, which still seems reasonably unlikely (say < 10% probability) but is certainly worth thinking about more carefully in advance.
But even granting a fast tak... (read more)
Just a minor terminology quibble: the “black” in “black-box” does not refer to the color, but to the opacity of the box; i.e., we don’t know what’s inside. “White-box” isn’t an obvious antonym in the sense I think you want.
“Clear-box” would better reflect the distinction that what’s inside isn’t unknown (i.e., it’s visible and understandable). Or perhaps open-box might be even better, since not only we know how it works but also we put it there.
Just to be clear, you are proposing that mere friendliness is insufficient, and we also want optimality with respect to getting as much of the cosmos as we can? This seems contained in friendliness, but OK. You are not proposing that optimally taking over the universe is sufficient for friendliness, right?
I've been thinking a lot about this, and I also think this is most likely to work. On general principle, understanding the problem and indirectly solving it is more promising than trying to solve the proble... (read more)
Meh. If we can get a safe AI, we've essentially done the whole of the work. Optimality can be tacked on easily at that point, bearing in mind that what may seem optimal to some may be an utter hellish disaster to others (see Repugnant Conclusion), so some sort of balanced view of optimality will be needed.
How is that defined? I would expect that minimizing astronomical waste would be the same as maximizing the amount used for intrinsically valuable things, which would be the same as maximizing utility.
Human intelligence is getting more substantially enhanced all the time. No doubt all parties will use the tools available - increasingly including computer-augmented minds as time passes.
So: I'm not clear about where it says that this is their plan.