# 6

Why AGI is extremely likely to come before FAI

I am using FAI to mean EY's definition of provably friendly AGI implementing CEV. It is possible that the AGI which comes first will not be *provably* friendly, but will never the less turn out to be friendly, although the probability that non-provably friendly AI turns out to be friendly depends on how it is programmed. But this is a subtopic for another article - Part II: what to do if FAI is unlikely to come first.

I realize much of this has been said before.

Complexity of algorithm design:

Intuitively, FAI seems orders of magnitude more complex than AGI. If I decided to start trying to program an AGI tomorrow, I would have ideas on how to start, and maybe even make a minuscule amount of progress. Ben Goertzel even has a (somewhat optimistic) [roadmap](http://opencog.org/roadmap/) for AGI in a decade. Meanwhile, afaik FAI is still stuck at the stage of lob’s theorem.
The fact that EY seems to be focusing on promoting rationality and writing (admittedly awesome) harry potter fanfiction seems to indicate that he doesn’t currently know how to write FAI (and nor does anyone else) otherwise he would be focusing on that now, and instead is planning for the long term.

Computational complexity

CEV requires modelling (and extrapolating) every human mind on the planet, while avoiding the creation of sentient entities. While modelling might be cheaper than ~10^17 flops per human due to short cuts, I doubt it’s going to come cheap. Randomly sampling a subset of humanity to extrapolate from, at least initially, could make this problem less severe, although you will get a poorer estimate of humanities utility function. Furthermore, this can be partially circumvented by saying that the AI follows a specific utility function while bootstrapping to enough computing power to implement CEV, but then you have the problem of allowing it to bootstrap safely. Having to prove friendliness of each step in self-improvement strikes me as something that could also be costly.
Finally, I get the impression that people are considering using Solomonoff induction. It’s uncomputable, and while I realize that there exist approximations, I would imagine that these would be extremely expensive to calculate anything non-trivial. Is there any reason for using SI for FAI more than AGI, e.g. something todo with provability about the programs actions?

Infeasibility of relinquishment.

If you can’t convince Ben Goertzel that FAI is needed, even though he is familiar with the arguments and is an adviser to SIAI, you’re not going to get anywhere near a universal consensus on the matter. Furthermore, AI is increasingly being used in financial and possibly soon military applications, and so there are strong incentives to speed the development of AI. While these uses are unlikely to be full AGI, they could provide building blocks – I can imagine a plausible situation where an advanced AI that predict the stock exchange could easily be modified to be a universal predictor.
The most powerful incentive to speed up AI development is the sheer number of people who die every day, and the amount of negentropy lost in the case that the 2nd law of thermodynamics cannot be circumvented. Even if there could be a worldwide ban on non-provably safe AGI, work would still probably continue in secret by people who thought the benefits of an earlier singularity outweighed the risks, and/or were worried about ideologically opposed groups getting their first.

Financial bootstrapping

If you are ok with running a non-provably friendly AGI, then even in the early stages when, for example, your AI can write simple code or make reasonably accurate predictions but not speak English or make plans, you can use these to earn money, and buy more hardware/programmers.  This seems to be part of the approach Ben is taking.

Coming in Part II: is there any alternative (and doing nothing is not an alternative! even if FAI is unlikely to work its better than giving up!)

# 6

New Comment

I realize much of this has been said before.

While technically true, I think the implication is incorrect. I believe all of this has been said before. I also believe it has been said in more detail. I think you should cite your sources. This would also make it clearer which parts are your contribution.

It's also much more appealing to go find additional sources to improve or refute the arguments in an article when the author has expressed enough interest in it to find the time to add citations. Citations aren't just useful in their own right; they also serve as a signal that you believe the article is worthy of both your time and mine.

I assume most of what I said has already been said before because I'm sure intelligent people will have thought of it, I cannot recall having actually read discussions about most of the points (except for relinquishment), and I have read a lot of LW and related sites. Because of this, I don't actually have the links to previous discussions of these points.

I'll try to add more citations to future posts.

The fact that EY seems to be focusing on promoting rationality and writing (admittedly awesome) harry potter fanfiction

His rate of production of HP:MoR is inconsistent with that being anything like his main focus.

It's a conjunction; CFAR and HP:MoR. The former does seem to be taking up his time...

And 'anything like' was a word choice specifically chosen to question HP:MoR's inclusion in the set of main focus things.

You're missing the point. He didn't say strictly that HPMOR is his main focus, but also promoting rationality. And thus,

seems to indicate that he doesn’t currently know how to write FAI (and nor does anyone else) otherwise he would be focusing on that now, and instead is planning for the long term.

Which sounds exactly right to me.

I'm not missing the point, I'm nitpicking. There's a difference.

A reassuring note on complexity: a CEV-running AI doesn't have to simulate everyone - it just has to simulate the abstract algorithm referred to by a CEV. Looking at peoples' brains is evidence about what this algorithm is, but like in random sampling, standard deviation goes down like one over the square root of the number of samples.

I think you're being overly optimistic. Unless the AI starts with a bound on the size of the algorithm it's learning (I can't think of a way to come up with one a priori), it will be doing nonparametric inference for which the rates of convergence are slower and even nonexistent in the cases where usual estimators (e.g., MLE) are inconsistent.

Yeah, I was playing pretty fast and loose there. With the goal being an algorithm, standard deviation doesn't make much sense, and there's not even necessarily a convergent solution (since you might always be able to make people happier by including another special case). But properties of the output should still converge, or else something is wrong, so it probably still makes sense to talk about rate of convergence there.

Which is probably pretty instant for the human universals, and then human variation can be treated as perturbations to some big complicated human-universal model. And I have no idea what kind of convergence rate for output properties that actually leads to :P

I thought the whole point of CEV was to make the AI infer what the process of extrapolation would return if anyone were fool enough to actually do it. A merely human-level intelligence could figure out that humanity, extrapolated as we wish, would tell the AI not to kill us all and use our atoms for computing power (in order to, say, carry out the extrapolation).

Good point. Actually, reconsidering the whole setup, I think my argument actually works the other way and ends showing that Manfred is being pessimistic.

Manfred's claim was something to the effect that you can get a good approximation to CEV by coherently extrapolating the volition of a random sample of people. Why? Because under some unstated assumptions (data on individuals reveals information about their CEV with some sort of iid error model, CEV has a representation with bounded complexity, etc.), it's reasonable to expect that the error in the inference falls off slowly with the number of individuals observed. Hence, you don't lose very much by looking at just a few people.

I mentioned that one of the unstated assumptions, bounded complexity of CEV, might not be justified, resulting in a slower fall off in the inferential error. However, this actually justifies working with a small sample more strongly. There's less expected gain in the quality of inference (or even no expected gain in case of inconsistency) for using a larger sample.

I'm not sure there's any magic way for the AI to jump right to what it "would return" without actually doing work in the form of looking at data and doing stuff like inference. In such tasks, its performance is governed pretty tightly by established statistical theory.

I am using FAI to mean EY's definition of provably friendly AGI implementing CEV.

Got a reference for that?

For reference, here's a conventional definition.

while I realize that there exist approximations, I would imagine that these would be extremely expensive to calculate anything non-trivial.

Have you looked into what they can calculate?

I wasn't aware of any actual practical implementations of SI. That link isn't talking about SI, but it's similar and really impressive. Something similar to Optimal Ordered Problem Solver Induction would sound like a sensible approach to formalizing induction.

Right now, the most complex problem solved by approximations that I know of is from MC-AIXI http://www.vetta.org/2009/09/monte-carlo-aixi/ where a desktop computer learns to play 'a somewhat reasonable game of Pac-man.' That was back in 2008-2009, and I don't know what problems Legg & co. may have solved at their startup.

I'm already aware of that paper, but it seems to me that MC-AIXI is more similar to MC tree search than to SI. I'm quite impressed with the effectiveness of MC tree search for Go.

it seems to me that MC-AIXI is more similar to MC tree search than to SI.

I'm not sure it's entirely meaningful to say they are very different. A tree of quasi-programs with weightings over them and a random playout for the point at which you have to stop looking deeper. A computable approximation has to be computable, after all, so it's not too much of a surprise if it reuses computable techniques that have been found to be effective in specific domains.

Ok, they both tree search over a space, whether it is the space of strategies or the space of programs. That does make sense.

I think my initial reaction to SI was very negative - even without the halting problem, simply testing every program of length<n is crazy. By comparison, I could imagine some kind of tree search, possibly weighted by heuristics, to be efficient.

I think my initial reaction to SI was very negative - even without the halting problem, simply testing every program of length<n is crazy.

It's crazy, but some universes are crazy. Pity the poor AI who wakes up inside a simulation where the programmer is in fact testing it on every program of length <n!

We could be in a simulation where the programmer is in fact testing it on every program of length <n! Doesn't seem so bad.

I believe there's an implicit assumption here that it's possible to create AGI without an understanding of intelligence and motives that would lead one to choose FAI. (Of course, there's a second question about whether an AI designed to be friendly will actually -be- friendly, but that's another issue altogether.)

Making a comparison between the very broad category of AGI and FAI may be helpful, but phrasing it so that the only AIs which count as Friendly are provably CEV implementing AIs makes the claim not very interesting. AGI research is in its infancy as are ideas about CEV- the probability that we'll end up with some other notion of Friendliness that is easier to implement is high.

I would say that coming up with a better notion of Friendliness is a necessary condition for implementing a Friendly AI.

I was talking about provably CEV implementing AI because there seems to be a consensus on LW that this is the correct approach to take.

P(provably CEV implementing AI | other FAI | AGI turns out to be friendly anyway | safe singularity for any other reason) is quite a lot higher than P(provably CEV implementing AI).

there seems to be a consensus on LW

This almost has to be false. I personally think CEV sounds like the best direction I currently know about, but maybe the process of extrapolation has a hidden 'gotcha'. Hopefully a decision theory that can model self-modifying agents (like our extrapolated selves, perhaps, as well as the AI) will help us figure out what we should be asking. Settling on one approach before then seems premature, and in fact neither the SI nor Eliezer has done so.

Meta-comment: I would prefer articles not being divided into multiple parts especially when each part is just one screen of text.

Agreed. The prognosis for humanity looks pretty grim.