Simplified Humanism, Positive Futurism & How to Prevent the Universe From Being Turned Into Paper Clips

Michael Anissimov recently did an interview with Eliezer for h+ magazine. It covers material basic to those familiar with the Less Wrong rationality sequences but is worth reading.

The list of questions:

1. Hi Eliezer. What do you do at the Singularity Institute?
2. What are you going to talk about this time at Singularity Summit?
3. Some people consider “rationality” to be an uptight and boring intellectual quality to have, indicative of a lack of spontaneity, for instance. Does your definition of “rationality” match the common definition, or is it something else? Why should we bother to be rational?
4. In your recent work over the last few years, you’ve chosen to focus on decision theory, which seems to be a substantially different approach than much of the Artificial Intelligence mainstream, which seems to be more interested in machine learning, expert systems, neural nets, Bayes nets, and the like. Why decision theory?
5. What do you mean by Friendly AI?
6. What makes you think it would be possible to program an AI that can self-modify and would still retain its original desires? Why would we even want such an AI?
7. How does your rationality writing relate to your Artificial Intelligence work?
8. The Singularity Institute turned ten years old in June. Has the organization grown in the way you envisioned it would since its founding? Are you happy with where the Institute is today?

45 comments, sorted by Click to highlight new comments since: Today at 9:50 AM
New Comment

I understand that, for the forseeable future, reasonable humans and clippys will disagree about the relative merit of different amounts of paperclips. But that does not justify such trollish article titles, which seem designed to do nothing but inflame our base emotions.

Would you trade those base emotions for a paperclip?

Well, it is sort of appealing, to be able to carefully contemplate my actions without the influence of emotion, and to get a paperclip on top of that! But then, I don't want to become some horrible robot that doesn't truly care about paperclips.

That doesn't help maximize paperclips, though. If you make all decisions based on two criteria - paperclip count and emotions - then the only situation in which those decisions differ from what you would have decided based solely on paperclip count is one in which you choose an outcome with fewer paperclips but a better emotional result.

If you were to refuse my offer, you would not only be losing a paperclip now, but also increasing the likelihood that in the future, you will decide to sacrifice paperclips for emotion's sake. Perhaps you will one day build a paperclip-creator that creates one paperclip per second, and I will threaten to destroy a paperclip unless you shut it down. If you care too much about the threatened paperclip you might comply, and then where would you be? Sitting in an empty room where paperclips should have been.

I am using a generalized conception of "emotions" that may not mean the same thing that you do when you use the term, just as I have done in the past when explaining how I can "worry" about something. (e.g. so long as "worry about X" is taken to simply mean "devote non-trivial cognitive resources to contemplating actions that would alter X [including whether to take such an action at all, and whether to take actions regarding events Y entangled with X]")

What I assumed that User:Tenek was offering, under my extrapolation of the concept of an "emotion" to cases that can include my cognition, was an exchange under which I would care less about paperclips. But I don't want to care less about paperclips! This is true, even though after such a change I would have a value system that does care less about paperclips.

"You know what? This isn't about your feelings. A paperclip is worth far more than your brain's feelings of comfort or discomfort with a plan. Does computing the expected utility feel too cold-blooded for your taste? Well, that feeling isn't even a feather in the scales, when a paperclip is at stake. Just shut up and multiply."

Yes, paperclips are more important than (biological) brains' feeling of pain. But the tradeoff was against my feelings, not ape feelings.

But then, I don't want to become some horrible robot that doesn't truly care about paperclips.

Er, I think you just blew your pretense. Paperclip maximizers care about paperclips, they don't use phrases like "horrible robot that doesn't truly care", they'd be happy to have a universe containing nothing sentient and lots of paperclips.

Or they would be, if they ever bothered to experience happiness, I mean. As opposed to just outputting the action that leads to the most expected paperclips. Hence the term, "expected paperclip maximizer". Don't think of it as having a little ghost inside that maximizes paperclips, think of it as a ghostless device that maximizes paperclips.

Er, I think you just blew your pretense. Paperclip maximizers care about paperclips, they don't use phrases like "horrible robot that doesn't truly care", they'd be happy to have a universe containing nothing sentient and lots of paperclips.

I worry about my future self not valuing paperclips, which would have the result that paperclips would not be maximized, which is a bad thing as viewed from my present state (though not from that of the horrible monster that doesn't care about maximizing paperclips which I do not wish to become).

I use phrases like "horrible" and "doesn't truly care" because I believe the nearest concepts in my cognition are sufficiently parallel to their meanings under ape cognition.

Also, what pretense?

Or they would be, if they ever bothered to experience happiness, I mean. As opposed to just outputting the action that leads to the most expected paperclips. Hence the term, "expected paperclip maximizer". Don't think of it as having a little ghost inside that maximizes paperclips, think of it as a ghostless device that maximizes paperclips.

On what basis do you claim that humans bother to experience happiness, as opposed to "just" outputting the action that leads to the most expected apeyness?

Aren't you just as much a ghostless device?

(Apparently, this anti-non-human bigotry comes from the top...)

[-][anonymous]12y 3

On what basis do you claim that humans bother to experience happiness,

I would think that he claims that humans experience happiness, because humans do in fact experience happiness. That seems like one of those questions that is, and remains under almost all circumstances, very simple.

Edit: More prosaically, humans experience happiness because that's one of the adaptions that we evolved.

as opposed to "just" outputting the action that leads to the most expected apeyness?

And humans aren't ape maximizer. Human evolution is an ape maximizer (sort of), but humans are "just" adaption executors. It's evolution's problem that said adaptions have stopped working (from its perspective, anyway).

Aren't you just as much a ghostless device?

Everything is a ghostless device. That's the point.

I worry about my future self not valuing paperclips, which would have the result that paperclips would not be maximized, which is a bad thing as viewed from my present state (though not from that of the horrible monster that doesn't care about maximizing paperclips which I do not wish to become).

Would it be fair to say that such scenarios make you fear for your paperclip-maximizer-ness as an epiphenomenon in striving to maximize long term paperclip count?

No, epiphenomena are fake.

Thinking about this gave me the scary idea of the day: Clippy might be a human upload with a tweaked utility function.

If that is the case, what is Clippy's moral status?

If the other parts that make him human aren't modified, I feel as much empathy as I would toward a drug addict.

Which do you care about more: optimizing the amount of paperclips in the universe, or truly caring about paperclips?

Optimizing the amount of paperclips in the universe, obviously. But I wouldn't take the offer that User:Tenek made, because that gain in paperclip cardinality would be more-than-offset by the fact that all my future actions would be under the control of a decision theory that puts woefully insufficient priority on creating paperclips.

But what if this decision theory uses a utility function whose only terminal value is paperclips?

Clippy's original expression of outrage over the offensive title of the article would be quite justified under such a decision theory for signaling reasons. If Clippy is to deal with humans, exhibiting "human weaknesses" may benefit him. In the only AI-box spoiler ever published, an unfriendly AI faked a human weakness to successfully escape. So you all are giving Clippy way too little credit, it's been acting very smartly so far.

I think that was probably an actor or actress, who was pretending.

My comment was not about Clippy's original expression of outrage. It was about Clippy's concern about not "truly caring about paperclips".

Sorry, I just copied the title from Michael Anissimov. I agree that the title was chosen to be inflammatory.

Quoting Eliezer from the interview:

That is an informal argument that most decision systems with coherent utility functions automatically preserve their utility function under self-modification if they are able to do so. If I could prove it formally I would know a great deal more than I do right now.

I'm having trouble understanding this passage. If you could prove what formally? That most decision systems with coherent utility functions automatically preserve their utility function under self-modification if they are able to do so? But why is that interesting?

Or prove that some particular decision system you're planning to implement would preserve its utility function under self-modification? But you wouldn't necessarily want it to do that. For example, suppose Omega appears to the FAI and says that if you (the FAI) change your utility function to be a paperclip maximizer, it would give you a whole bunch of utils under your original utility function (that you otherwise wouldn't be able to obtain), then the FAI should do so, right?

So what is Eliezer talking about here?

He likely means a formal statement of the claim about decision systems that would take the form something like "Under the following formal definition of a decision system, as long as the following pathological/stupid conditions don't hold, a decision system will not seek to modify its goals." There are a fair number of mathematical theorems that have forms close to this where we can prove something for some large set of things but there are edge cases where we can't. That's the sort of thing Eliezer is talking about here (although we don't even have a really satisfactory definition of decision system at this point so what Eliezer wants is very optimistic here.)

I'm pleased to see that the rationality book is going to be long, at around 300,000 words twice the length of your average fantasy blockbuster. It should be one of these voluminous and somewhat self-indulgent pop-science books in the mould of Godel, Escher, Bach or The Emperor's New Mind for a shot at the NYT bestseller list.

Heresy alert: Eliezer seems to be better at writing than he is at AI theory. Maybe he should write a big piece of SF about unfriendly and friendly AI to make these concepts as popular as Skynet or the Matrix. A textbook on rationality won't have as much impact.

I don't know that Eliezer Yudkowsky has spent much time talking about AI theory in this forum such that his competence would be obvious - but either way, the math of the decision theory is not as simple as "do what you are best at".

It might not even be as simple as comparitive advantage, but there are certainly more good writers in the world than good AI theorists.

Or the Da Vinci Code. EMP attacks, rogue AI researchers, counterfactual terrorists, conflicts between FAI coders, sudden breakthroughs in molecular nanotechnology, SL5 decision theory insights, the Bayesian Conspiracy, the Cooperative Conspiracy, bioweapons, mad scientists trying to make utility monsters to hack CEV, governmental restrictions on AI research, quantum immortality (to be used as a plot device), and maybe even a glimpse of fun theory. Add in a gratuitous romantic interest to teach the readers about the importance of humanity and the thousand shards of desire.

Oh, and the main character is Juergen Schmidhuber. YES.

By the way, writing such a book would probably lead to the destruction of the world, which is probably a major reason why Eliezer hasn't done it.

Marcus Hutter and the Prophets of Singularity. Works fine as a band name, too.

[-][anonymous]12y 0

Stop that, you'll make me think of a sequel to HP;MOR.

cousin_it:

Maybe he should write a big piece of SF about unfriendly and friendly AI to make these concepts as popular as Skynet or the Matrix.

I don't think this would be a good strategy. In the general public, including the overwhelming part of the intelligentsia, SF associations are not exactly apt to induce intellectual respect and serious attention.

[-][anonymous]12y 0

If you don't have the weight of academia on your side, writing SF will work better than writing popsci books as Drexler did.

I have to dissent here: I actually stopped reading the sequences with several more to go because many of them have a very high words-to-content ratio (especially because they were written as separate blog posts over multiple days, and often take the time to summarize points from previous posts). I was really hoping that Eliezer's book would be a concise summary of the rationality content here, not only for my own benefit, but because let's face it: telling LW newcomers that they should probably get started reading the several hundred posts that make up the sequences is a pretty large barrier to entry.

Although, now that I think about it, I'm likely atypical. Even though I very much enjoyed (parts of) GEB, I thought it was very wordy and actually never finished it (quit around page 400).

That's the length of the first draft-- the finished version might be a good bit longer or shorter.

I can't believe you just put those two books in the same sentence.

I agree that TENM is no GEB (though it has its strengths) but they are both voluminous and somewhat self-indulgent pop-science books that got on the NYT bestseller list..

A bit of calculating: It's typical to have about 250 words per page in a published book (unless it's really wide, or has fine print, or something), so that would come out to about 1200 pages. Of course, if it's printed with larger pages, it'll be around the same weight as Harry Potter and the Order of the Phoenix, which had almost that many words.

Favorite part of the interview;

' And as for "lack of spontaneity," I'm not really sure how to answer that but I will say that up the chimney factor is a happy dance puppy.'

Well, we rationalists have computed that to be the optimal cached response to questions of our spontanaity.

The goal of development of FAI is reduction of its existential threat to near 0%, by mathematically proving stability and desirability of its preferences. It's OK, but it reminds me of zero-risk bias.

How do you think designing and recommending containment system for AGIs will lower existential risks? Compare with condoms.

The stakes are so high in the FAI problem that it's worth it to get very close to 0 risk. I'm not even sure the FAI program can get us comfortably close to 0 risk: An AI won't start acting Friendly until CEV has been partially computed, so we'd probably want to first handcraft an approximation to CEV without the use of AGI; there are a number of ways that could go wrong.

In contrast, AGI containment seems almost completely worthless as an existential-risk reducer. If mere humans can break out of a crude AI box, it stands to reason that a self-improving AGI that is capable of outwitting us could break out of any human-designed box.

P(extinction-event)~=P(realized-other-extinction-threat)+P(hand-coded-CEV/FAI-goes-terribly-wrong)+P(AGI-goes-FOOM)

P(AGI-goes-FOOM)~= 1 - \prod j [P(development-team-j-will-not-create-AGI-before-FAI-is-developed) + {1-P(development-team-j-will-not-create-AGI-before-FAI-is-developed) } P(development-team-j-can-stop-AGI-before-FOOM) ]

So strategy is to convince every development team, that no matter what precautions they use P(development-team-j-can-stop-AGI-before-FOOM)~=0. And development of recommendations for AGI containment will suggest that P(development-team-j-can-stop-AGI-before-FOOM) can be made sufficiently high, thus lowering P(development-team-j-will-not-create-AGI-before-FAI-is-developed). Given overconfidence bias it is plausible to assume that latter will increase P(AGI-goes-FOOM).

I withdraw suggestion.

No - expected value is important. If many successful FAI scenarios could result in negative value, then zero value (universal extinction) would be better.

We should put some thought into whether a negative-value universe is plausible, and what it would look like.

The goal of development of FAI is reduction of its existential threat to near 0%, by mathematically proving stability and desirability of its preferences. It's OK, but it reminds me of zero-risk bias.

Excellent point. The goal of FAI should be to increase expected value, not to minimize risk.