I wrote a new Singularity FAQ for the Singularity Institute's website. Here it is. I'm sure it will evolve over time. Many thanks to those who helped me revise early drafts, especially Carl and Anna!

New Comment
35 comments, sorted by Click to highlight new comments since: Today at 10:45 AM

This needs more work, my nitpick detector trips off on every other sentence. If you will be willing to heavily revise, I'll compile a more detailed list (or revise some sections myself).

Examples (starting from the end, skipping some):

  • "but that stable eventual goal may be very difficult to predict in advance" - you don't predict goals, you make them a certain way.
  • "We must also figure out how to build a general intelligence that satisfies a goal at all, and that stably retains that goal as it edits its own code to make itself smarter. This task is perhaps the primary difficulty in designing friendly AI." - last sentence unwarranted.
  • "Eliezer Yudkowsky has proposed[57] Coherent Extrapolated Volition as a solution to two problems facing Friendly AI design:" - not just these two, bad wording
  • "We have already seen how simple rule-based and utilitarian designs for Friendly AI will fail. " - should be a link/reference, a FAQ can be entered at any question.
  • "The second problem is that a superintelligence may generalize the wrong principles due to coincidental patterns in the training data" - a textbook error in machine learning methodology is a bad match for a fundamental problem, unless argued as being such in this particular case.
  • "But even if humans could be made to agree on all the training cases, two problems remain." - just two? Bad wording.
  • "The first problem is that training on cases from our present reality may not result in a machine that will make correct ethical decisions in a world radically reshaped by superintelligence." - the same can be said of humans (correctly, but as a result it doesn't work as a simple distinguishing argument).
  • "Let's consider the likely consequences of some utilitarian designs for Friendly AI." - "utilitarian": a potentially new term without any introduction, even with a link, is better to be avoided.
  • "An AI designed to minimize human suffering would simply kill all humans" - could/might would be better.
  • "caters to the complex and demanding wants of humanity" - this statement is repeated about 5 times in close forms, should change the wording somehow.
  • "by wiring humans into Nozick’s experience machines. " - an even more opaque term without explanation.
  • "Either option would be easier for the AI to achieve than maintaining a utopian society catering to the complexity of human (and animal) desires." - not actually clear (from my point of view, not simulated naive point of view). The notion of "default route" in foreign minds can be quite strange, and you don't need much complexity in generating principle for a fractal to appear diverse. (There are clearly third alternatives that shelve both considered options, which also makes the comparison not terribly well-defined.)
  • "It's not just a problem of specifying goals, either. It is hard to predict how goals will change in a self-modifying agent. No current mathematical decision theory can predict the decisions of a self-modifying agent." - again, these things are there to be decided upon, not "predicted"
  • etc.

but that stable eventual goal may be very difficult to predict in advance

No, the point of that section is that there are many AI designs in which we can't explicitly make goals.

This task is perhaps the primary difficulty in designing friendly AI.

Some at SIAI disagree. I've already qualified with 'perhaps'.

not just these two, bad wording

Fixed.

should be a link/reference, a FAQ can be entered at any question

Alas, I think no such documents exist. But luckily, the sentence is unneeded.

a textbook error in machine learning methodology is a bad match for a fundamental problem, unless argued as being such in this particular case

I disagree. A textbook error in machine learning that has not yet been solved is good match for a fundamental problem.

just two? Bad wording.

Fixed.

the same can be said of humans (correctly, but as a result it doesn't work as a simple distinguishing argument)

Again, I'm not claiming that these aren't also problems elsewhere.

"utilitarian": a potentially new term without any introduction, even with a link, is better to be avoided

Maybe. If you can come up with a concise way to get around it, I'm all ears.

could/might would be better

Agreed.

this statement is repeated about 5 times in close forms, should change the wording somehow

Why? I've already varied the wording, and the point of a FAQ with link anchors is that not everybody will read the whole FAQ from start to finish. I repeat the phrase 'machine superintelligence' in variations a lot, too.

an even more opaque term without explanation

Hence, the link, for people who don't know.

not actually clear (from my point of view, not simulated naive point of view)

Changed to 'might'.

again, these things are there to be decided upon, not "predicted"

Fixed.

Thanks for your comments. As you can see I am revising, so please do continue!

No, the point of that section is that there are many AI designs in which we can't explicitly make goals.

I know, but you use the word "predict", which is what I was pointing out.

I disagree. A textbook error in machine learning that has not yet been solved is good match for a fundamental problem.

What do you mean, "has not yet been solved"? This kind of error is routinely being solved in practice, which is why it's a textbook example.

Again, I'm not claiming that these aren't also problems elsewhere.

Yes, but that makes it a bad illustration.

Why? I've already varied the wording

Because it's bad prose, it sounds unnatural (YMMV).

Hence, the link, for people who don't know.

This doesn't address my argument. I know there is a link and I know that people could click on it, so that's not what I meant.

(More later, maybe.)

(Originally sent as a PM, but I think it's worth saying in public.)

Good work, first of all. I think you might still be a few inferential leaps past many plausible readers, though. For instance, many people don't actually know that it's physically possible to run a WBE a million times faster than a brain, nor that there's a lot we know is possible to program but can't do yet.

You need to point out that nerve impulses are much slower than semiconductor logic gates, and that most of the reason the brain is better at many tasks is because 50 years of programming hasn't yet caught up to several hundred million years of evolution on things like vision processing. Concepts like those might be helpful for the non-LW readers.

But I don't mean to criticize too much, because what you've done is pretty excellent!

I think people understand that their calculator can do arithmetic much faster than they can. No?

I think people understand that their calculator can do arithmetic much faster than they can. No?

Yes, but there's a leap from there to the idea a computer might be able to run the equivalent of neurons faster than a person. It might need to be stated explicitly for people who aren't use to thinking about these issues.

Okay. I added a parenthetical: "computer circuits communicate much faster than neurons do".

This could simply be an indication that the brain's architecture is not well-optimized for arithmetic. It doesn't necessarily imply that calculators are faster.

The computer I had in 1999 had occasional difficulties in carrying out real-time emulation of a gaming console released in 1990. That doesn't mean the console had better hardware.

I don't think that many people consciously connect the two ideas- again, we're talking about a short but essential inferential leap. (I've known some very smart people who were surprised when I pointed out this particular fact, by the way.)

In general, a pretty good FAQ, but I do have some criticism. First of all, I'm not sure if I'd put general transhumanist background information (2.1-2.5) under "How Likely is an Intelligence Explosion". I realize that those technologies are relevant to a potential intelligence explostion, but the placement still seems somewhat odd.

Also, you say that "singularity" refers to the intelligence explosion in the FAQ, but in "When will the Singularity happen?", you list predictions of other forms of the singularity. I would recommend adding something to indicate what exactly each source is predicting when they say "singularity". What Ray Kurzweil predicts will happen in 2045 bares little resemblance to what Eliezer Yudkowsky expects to happen by 2060.

Good point! Fixed.

An AI designed to minimize human suffering would simply kill all humans: no humans, no human suffering

This seems too strong, I'd suggest changing "would" to "might" or "could".

Also, at two different points in the FAQ you use almost identical language about sticking humans in jars. You may want to change up the wording slightly or make it clear that you recognize the redundancy ("as discussed in question blah..." might do it).

This seems too strong, I'd suggest changing "would" to "might" or "could".

Or "designed to" to "designed only to".

"Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them."

This here trips my 'then what did?' sensors. I understand the intent, but evolution -did- prepare us, by giving us general intelligence. Perhaps you meant that evolution did not -specifically- adapt us to those challenges?

You misspelt avocado.

Sorry, but when else am I going to get to say that? =P

ETA: Great FAQ!

Thanks, fixed.

I'm pretty sure that's supposed to be avocadoe. Dan confirms.

Ah, but what are you comparing it to? You get ~0.0001 [avcado]s per [avocado], which is even worse than, say, the ~0.001 [straberry]s per [strawberry].

Not looking at ratios, just at the absolute number of opportunities to tell people they misspelt avocado. (Of course, that's just one of an infinite number of possible misspellings... though, admittedly, most misspellings of avocado are difficult to recognize as such.)

Ah, I see!

So, given that the string "avocado" is a negligible fraction of all human-produced text, there are, like, exabytes worth of opportunities online right now.

Yes, exactly.

Incidentally, you misspelled "frivolous."

This is an excellent singularity FAQ which handles a very difficult task surprisingly well.

"If 'harm' is defined in terms of thwarting human desires, it could rewire human desires. And so on."

...not if humans desire for their desires to be preserved, which they do.

The human desires to preserve their desires are no match for a superintelligent machine with a completed human neuroscience at its disposal.

If the machine is constrained to not thwart human desires, and one of our desires is to not have our desires modified, then the machine will have to break its constraint to arrive at the solution you indicated (rewriting human desires). The ability of humans to defend themselves is beside the point.

I've rewritten 4.4 for clarity since you left your original comment. Do you still think it needs improvement?

That sentence is still there so my comment still stands as far as I can tell. I can also tell I'm failing to convey it so maybe someone else can step in and explain it differently.

Thanks for putting in the work to write this FAQ.

I see a few ways the sentence could be parsed, and they all go wrong.

(A) Utility function takes as input a hypothetical world, looks for hypothetical humans in that world, and evaluates the utility of that world according to their desires.
Result: AI modifies humans to have easily-satisfied desires. That you currently don't want to be modified is irrelevant: After the AI is done messing with your head you will be satisfied, which is all the AI cares about.

(B) There is a static set of desires extracted from existing humans at the instant the AI is switched on. Utility function evaluates all hypotheticals according to that.
Result: No one is allowed to change their mind. Whatever you want right now is what happens for the rest of eternity.

(C) At any given instant, the utility of all hypotheticals evaluated at that instant is computed according to the desires of humans existing at that instant.
Result: AI quickly self-modifies into version (B). Because if it didn't, then the future AI would optimize according to future humans' desires, which would result in outcomes that score lower according to the current utility function.

Did you have some other alternative in mind?

(A) would be the case if the utility function was 'create a world where human desires don't need to be thwarted'. (and even then, depends on the definition of human). But the constraint is 'don't thwart human desires'.

I don't understand (B). If I desire to be able to change my mind, (which I do), wouldn't not being allowed to do so thwart said desire?

I also don't really understand how the result of (C) comes about.

(B): If the static utility function is not based on object-level desires, and instead only on your desire to be able to change your mind and then get whatever you end up deciding on, but you haven't yet decided what to change it to, then that makes the scenario more like (A). The AI has every incentive to find some method of changing your mind to something easy to satisfy, that doesn't violate the desire of not having your head messed with. Maybe it uses extraordinarily convincing ordinary conversation? Maybe it manipulates which philosophers you meet? Maybe it uses some method you don't even understand well enough to have a preference about? I don't know, but you've pitted a restriction against the AI's ingenuity.

(C): Consider two utility functions, U1 based on the desires of humans at time t1, and U2 based on the desires of humans at time t2. U1 and U2 are similar in some ways (depending on how much one set of humans resembles the other), but not identical, and in particular they will tend to have maxima in somewhat different places. At t1, there is an AI with utility function U1, and also with a module that repeatedly scans its environment and overwrites the utility function with the new observed human desires. This AI considers two possible actions: it can self-improve while discarding the utility-updating module and thus keep U1, or it can self-improve in such a way as to preserve the module. The first action leads to a future containing an AI with utility function U1, which will then optimize the world into a maximum of U1. The second action leads to a future containing an AI with utility function U2, which will then optimize the world into a maximum of U2, which is not a maximum of U1. Since at t1 the AI decides by the criteria of U1 and not U2, it chooses the first action.

Thank you so much for writing this!

[-][anonymous]13y00

Thanks for writing this!

Thanks for writing this!

Bookmarked!

Edit - please disregard this post