How to get that Friendly Singularity: a minority view


Note: I know this is a rationality site, not a Singularity Studies site. But the Singularity issue is ever in the background here, and the local focus on decision theory fits right into the larger scheme - see below.

There is a worldview which I have put together over the years, which is basically my approximation to Eliezer's master plan. It's not an attempt to reconstruct every last detail of Eliezer's actual strategy for achieving a Friendly Singularity, though I think it must have considerable resemblance to the real thing. It might be best regarded as Eliezer-inspired, or as "what my Inner Eliezer thinks". What I propose to do is to outline this quasi-mythical orthodoxy, this tenuous implicit consensus (tenuous consensus because there is in fact a great diversity of views in the world of thought about the Singularity, but implicit consensus because no-one else has a plan), and then state how I think it should be amended. The amended plan is the "minority view" promised in my title.

Elements Of The Worldview

There will be strongly superhuman intelligence in the historically immediate future, unless a civilization-ending technological disaster occurs first.

  • Implicit assumption: problem-solving entities (natural and artificial intelligences, and coalitions thereof) do possess an attribute, their "general intelligence", which is both objective and rankable. Theoretical computer science suggests that this is so, but that it takes a lot of conceptual work to arrive at a fully objective definition of general intelligence.
  • The "historically immediate future" may be taken to mean, as an absolute upper bound, the rest of this century. Personally, I find it hard to see how twenty more years can pass without people being able to make planet-killing nanotechnology, so I give it twenty years maximum before we're in the endgame.
  • I specify technological disaster in the escape clause, because a natural disaster sufficient to end civilization is extremely unlikely on this timescale, and it will require a cultural disruption of that order to halt the progression towards superhuman intelligence.

In a conflict of values among intelligences, the higher intelligence will win, so for human values / your values to survive after superintelligence, the best chance is for the seed from which the superintelligence grew to have already been "human-friendly".

  • Elementary but very important observation: for at least some classes of intelligence, such as the "expected-utility maximizer" (EUM), values or goals are utterly contingent. The component specifying the utility function is independent of the component which solves the problem of maximizing expected utility, and so literally any goal that can be parsed by the problem solver, no matter how absurd, can become its supreme value, just as a calculator will dutifully attempt to evaluate any expression that you throw at it. The contingency of AI core values means that neither utopia nor dystopia (from a human perspective) is guaranteed - though the latter is far more likely, if the values are specified carelessly.
  • The seed might be an artificial intelligence, modifying itself, or a natural intelligence modifying itself, or some combination of these. But AI is generally considered to have the advantage over natural intelligence when it comes to self-modification. 

The way to produce a human-friendly seed intelligence is to identify the analogue, in the cognitive architecture behind human decision-making, of the utility function of an EUM, and then to "renormalize" or "reflectively idealize" this, i.e. to produce an ideal moral agent as defined with respect to our species' particular "utility function".

  • Human beings are not EUMs, but we do belong to some abstract class of decision-making system, and there is going to be some component of that system which specifies the goals rather than figuring out how to achieve them. That component is the analogue of the utility function.
  • This ideal moral agent has to have, not just the right values, but the attribute of superhuman intelligence, if its creation is to constitute a Singularity; and those values have to be stable during the period of self-modification which produces increasing intelligence. The solution of these problems - self-enhancement, and ethical stability under self-enhancement - is also essential for the attainment of a Friendly Singularity. But that is basically a technical issue of computer science and I won't talk further about it.

The truly fast way to produce a human-relative ideal moral agent is to create an AI with the interim goal of inferring the "human utility function" (but with a few safeguards built in, so it doesn't, e.g., kill off humanity while it solves that sub-problem), and which is programmed to then transform itself into the desired ideal moral agent once the exact human utility function has been identified.

  • Figuring out the human utility function is a problem of empirical cognitive neuroscience, and if our AI really is a potential superintelligence, it ought to be better at such a task than any human scientist. 
  • I am especially going out on a limb in asserting that this final proposition is part of the master plan, though I think traces of the idea can be found in recent writings. But anyway, it's a plausible way to round out the philosophy and the research program; it makes sense if you agree with everything else that came before. It's what my Inner Eliezer thinks.


This is, somewhat remarkably, a well-defined research program for the creation of a Friendly Singularity. You could print it out right now and use it as the mission statement of your personal institute for benevolent superintelligence. There are very hard theoretical and empirical problems in there, but I do not see anything that is clearly nonsensical or impossible.

So what's my problem? Why don't I just devote the rest of my life to the achievement of this vision? There are two, maybe three amendments I would wish to make. What I call the ontological problem has not been addressed; the problem of consciousness, which is the main subproblem of the ontological problem, is also passed over; and finally, it makes sense to advocate that human neuroscientists should be trying to identify the human utility function, rather than simply planning to delegate that task to an AI scientist.

The problem of ontology and the problem of consciousness can be stated briefly enough: our physics is incomplete, and even worse, our general scientific ontology is incomplete, because inherently and by construction it excludes the reality of consciousness.

The observation that quantum mechanics, when expressed in a form which makes "measurement" an undefined basic concept, does not provide an objective and self-sufficient account of reality, has led on this site to the advocacy of the many-worlds interpretation as the answer. I recently argued that many worlds is not the clear favorite, to a somewhat mixed response, and I imagine that I will be greeted with almost immovable skepticism if I also assert that the very template of natural-scientific reduction - mathematical physics in all its forms - is inherently inadequate for the description of consciousness. Nonetheless, I do so assert. Maybe I will make the case at greater length in a future article. But the situation is more or less as follows. We have invented a number of abstract disciplines, such as logic, mathematics, and computer science, by means of which we find ourselves able to think in a rigorously exact fashion about a variety of abstract possible objects. These objects constitute the theoretical ontology in terms of which we seek to understand and identify the nature of the actual world. I suppose there is also a minimal "worldly" ontology still present in all our understandings of the actual world, whereby concepts such as "thing" and "cause" still play a role, in conjunction with the truly abstract ideas. But this is how it is if you attempt to literally identify the world with any form of physics that we have, whether it's classical atoms in a void, complex amplitudes stretching across a multiverse configuration space, or even a speculative computational physics, based perhaps on cellular automata or equivalence classes of Turing machines.

Having adopted such a framework, how does one then understand one's own conscious experience? Basically, through a combination of outright denial with a stealth dualism that masquerades as identity. Thus a person could say, for example, that the passage of time is an illusion (that's denial) and that perceived qualities are just neuronal categorizations (stealth dualism). I call the latter identification a stealth dualism because it blithely asserts that one thing is another thing when in fact they are nothing like each other. Stealth dualisms are unexamined habitual associations of a bit of physico-computational ontology with a bit of subjective phenomenology which allow materialists to feel that the mind does not pose a philosophical problem for them.

My stance, therefore, is that intellectually we are in a much much worse position, when it comes to understanding consciousness, than most scientists, and especially most computer scientists, think. Not only is it an unsolved problem, but we are trying to solve it in the wrong way: presupposing the desiccated ontology of our mathematical physics, and trying to fit the diversities of phenomenological ontology into that framework. This is, I submit, entirely the wrong way round. One should instead proceed as follows: I exist, and among my properties are that I experience what I am experiencing, and that there is a sequence of such experiences. If I can free my mind from the assumption that the known classes of abstract object are all that can possibly exist, what sort of entity do I appear to be? Phenomenology - self-observation - thereby turns into an ontology of the self, and if you've done it correctly (I'm not saying this is easy), you have the beginning of a new ontology which by design accommodates the manifest realities of consciousness. The task then becomes to reconstitute or reinterpret the world according to mathematical physics in a way which does not erase anything you think you established in the phenomenological phase of your theory-building.

I'm sure this program can be pursued in a variety of ways. My way is to emphasize the phenomenological unity of consciousness as indicating the ontological unity of the self, and to identify the self with what, in current physical language, we would call a large irreducible tensor factor in the quantum state of the brain. Again, the objective is not to reduce consciousness to quantum mechanics, but rather to reinterpret the formal ontology of quantum mechanics in a way which is not outright inconsistent with the bare appearances of experience. However, I'm not today insisting upon the correctness of my particular approach (or even trying very hard to explain it); only emphasizing my conviction that there remains an incredibly profound gap in our understanding of the world, and it has radical implications for any technically detailed attempt to bring about a human-friendly outcome to the race towards superintelligence. In particular, all the disciplines (e.g. theoretical computer science, empirical cognitive neuroscience) which play a part in cashing out the principles of a Friendliness strategy would need to be conceptually reconstructed in a way founded upon the true ontology.

Having said all that, it's a lot simpler to spell out the meaning of my other amendment to the "orthodox" blueprint for a Friendly Singularity. It is advisable to not just think about how to delegate the empirical task of determining the human utility function to an AI scientist, but also to encourage existing human scientists to tackle this problem. The basic objective is to understand what sort of decision-making system we are. We're not expected utility maximizers; well, what are we then? This is a conceptual problem, though it requires empirical input, and research by merely human cognitive neuroscientists and decision theorists should be capable of producing conceptual progress, which will in turn help us to find the correct concepts which I have merely approximated here in talking about "utility functions" and "ideal moral agents".

Thanks to anyone who read this far. :-)