Disjunctions, Antipredictions, Etc.

Eliezer Yudkowsky

Followup to: Underconstrained Abstractions

So if it's not as simple as just using the one trick of finding abstractions you can easily verify on available data, what are some other tricks to use?

There are several, as you might expect...

Previously I talked about "permitted possibilities". There's a trick in debiasing that has mixed benefits, which is to try and visualize several specific possibilities instead of just one.

The reason it has "mixed benefits" is that being specific, at all, can have biasing effects relative to just imagining a typical case. (And believe me, if I'd seen the outcome of a hundred planets in roughly our situation, I'd be talking about that instead of all this Weak Inside View stuff.)

But if you're going to bother visualizing the future, it does seem to help to visualize more than one way it could go, instead of concentrating all your strength into one prediction.

So I try not to ask myself "What will happen?" but rather "Is this possibility allowed to happen, or is it prohibited?" There are propositions that seem forced to me, but those should be relatively rare - the first thing to understand about the future is that it is hard to predict, and you shouldn't seem to be getting strong information about most aspects of it.

Of course, if you allow more than one possibility, then you have to discuss more than one possibility, and the total length of your post gets longer. If you just eyeball the length of the post, it looks like an unsimple theory; and then talking about multiple possibilities makes you sound weak and uncertain.

As Robyn Dawes notes,

"In their summations lawyers avoid arguing from disjunctions in favor of conjunctions. (There are not many closing arguments that end, "Either the defendant was in severe financial straits and murdered the decedent to prevent his embezzlement from being exposed or he was passionately in love with the same coworker and murdered the decedent in a fit of jealous rage or the decedent had blocked the defendant's promotion at work and the murder was an act of revenge. The State has given you solid evidence to support each of these alternatives, all of which would lead to the same conclusion: first-degree murder.") Rationally, of course, disjunctions are much more probable than are conjunctions."

Another test I use is simplifiability - after I've analyzed out the idea, can I compress it back into an argument that fits on a T-Shirt, even if it loses something thereby? Here's an example of some compressions:

The whole notion of recursion and feeding object-level improvements back into meta-level improvements: "If computing power per dollar doubles every eighteen months, what happens if computers are doing the research?"
No diminishing returns on complexity in the region of the transition to human intelligence: "We're so similar to chimps in brain design, and yet so much more powerful; the upward slope must be really steep."
Scalability of hardware: "Humans have only four times the brain volume of chimps - now imagine an AI suddenly acquiring a thousand times as much power."

If the whole argument was that T-Shirt slogan, I wouldn't find it compelling - too simple and surface a metaphor. So you have to look more closely, and try visualizing some details, and make sure the argument can be consistently realized so far as you know. But if, after you do that, you can compress the argument back to fit on a T-Shirt again - even if it sounds naive and stupid in that form - then that helps show that the argument doesn't depend on all the details being true simultaneously; the details might be different while fleshing out the same core idea.

Note also that the three statements above are to some extent disjunctive - you can imagine only one of them being true, but a hard takeoff still occurring for just that reason alone.

Another trick I use is the idea of antiprediction. This is when the narrowness of our human experience distorts our metric on the answer space, and so you can make predictions that actually aren't far from maxentropy priors, but sound very startling.

I shall explain:

A news story about an Australian national lottery that was just starting up, interviewed a man on the street, asking him if he would play. He said yes. Then they asked him what he thought his odds were of winning. "Fifty-fifty," he said, "either I win or I don't."

To predict your odds of winning the lottery, you should invoke the Principle of Indifference with respect to all possible combinations of lottery balls. But this man was invoking the Principle of Indifference with respect to the partition "win" and "not win". To him, they sounded like equally simple descriptions; but the former partition contains only one combination, and the latter contains the other N million combinations. (If you don't agree with this analysis I'd like to sell you some lottery tickets.)

So the antiprediction is just "You won't win the lottery." And the one may say, "What? How do you know that? You have no evidence for that! You can't prove that I won't win!" So they are focusing far too much attention on a small volume of the answer space, artificially inflated by the way their attention dwells upon it.

In the same sense, if you look at a television SF show, you see that a remarkable number of aliens seem to have human body plans - two arms, two legs, walking upright, right down to five fingers per hand and the location of eyes in the face. But this is a very narrow partition in the body-plan space; and if you just said, "They won't look like humans," that would be an antiprediction that just steps outside this artificially inflated tiny volume in the answer space.

Similarly with the true sin of television SF, which is too-human minds, even among aliens not meant to be sympathetic characters. "If we meet aliens, they won't have a sense of humor," I antipredict; and to a human it sounds like I'm saying something highly specific, because all minds by default have a sense of humor, and I'm predicting the presence of a no-humor attribute tagged on. But actually, I'm just predicting that a point in mind design volume is outside the narrow hyperplane that contains humor.

An AI might go from infrahuman to transhuman in less than a week? But a week is 10^49 Planck intervals - if you just look at the exponential scale that stretches from the Planck time to the age of the universe, there's nothing special about the timescale that 200Hz humans happen to live on, any more than there's something special about the numbers on the lottery ticket you bought.

If we're talking about a starting population of 2GHz processor cores, then any given AI that FOOMs at all, is likely to FOOM in less than 10^15 sequential operations or more than 10^19 sequential operations, because the region between 10^15 and 10^19 isn't all that wide a target. So less than a week or more than a century, and in the latter case that AI will be trumped by one of a shorter timescale.

This is actually a pretty naive version of the timescale story. But as an example, it shows how a "prediction" that's close to just stating a maximum-entropy prior, can sound amazing, startling, counterintuitive, and futuristic.

When I make an antiprediction supported by disjunctive arguments that are individually simplifiable, I feel slightly less nervous about departing the rails of vetted abstractions. (In particular, I regard this as sufficient reason not to trust the results of generalizations over only human experiences.)

Finally, there are three tests I apply to figure out how strong my predictions are.

The first test is to just ask myself the Question, "What do you think you know, and why do you think you know it?" The future is something I haven't yet observed; if my brain claims to know something about it with any degree of confidence, what are the reasons for that? The first test tries to align the strength of my predictions with things that I have reasons to believe - a basic step, but one which brains are surprisingly wont to skip.

The second test is to ask myself "How worried do I feel that I'll have to write an excuse explaining why this happened anyway?" If I don't feel worried about having to write an excuse - if I can stick my neck out and not feel too concerned about ending up with egg on my face - then clearly my brain really does believe this thing quite strongly, not as a point to be professed through enthusiastic argument, but as an ordinary sort of fact. Why?

And the third test is the "So what?" test - to what degree will I feel indignant if Nature comes back and says "So what?" to my clever analysis? Would I feel as indignant as if I woke up one morning to read in the newspaper that Mars had started orbiting the Sun in squares instead of ellipses? Or, to make it somewhat less strong, as if I woke up one morning to find that banks were charging negative interest on loans? If so, clearly I must possess some kind of extremely strong argument - one that even Nature Itself ought to find compelling, not just humans. What is it?

But if you're going to bother visualizing the future, it does seem to help to visualize more than one way it could go, instead of concentrating all your strength into one prediction. So I try not to ask myself "What will happen?" but rather "Is this possibility allowed to happen, or is it prohibited?"

I thought that you were changing your position; instead, you have used this opening to lead back into concentrating all your strength into one prediction.

I think this characterizes a good portion of the recent debate: Some people (me, for instance) keep saying "Outcomes other than FOOM are possible", and you keep saying, "No, FOOM is possible." Maybe you mean to address Robin specifically; and I don't recall any acknowledgement from Robin that foom is >5% probability. But in the context of all the posts from other people, it looks as if you keep making arguments for "FOOM is possible" and implying that they prove "FOOM is inevitable".

A second aspect is that some people (again, eg., me) keep saying, "The escalation leading up to the first genius-level AI might be on a human time-scale," and you keep saying, "The escalation must eventually be much faster than human time-scale." The context makes it look as if this is a disagreement, and as if you are presenting arguments that AIs will eventually self-improve themselves out of the human timescale and saying that they prove FOOM.

No diminishing returns on complexity in the region of the transition to human intelligence: "We're so similar to chimps in brain design, and yet so much more powerful; the upward slope must be really steep."

Or there is no curve and it is a random landscape with software being very important...

Scalability of hardware: "Humans have only four times the brain volume of chimps - now imagine an AI suddenly acquiring a thousand times as much power."

Bottle nosed dolphins have twice the brain volume as normal dolphins (and comparable to our brain volume), yet aren't massively more powerful compared to them. Asian elephants have 5 times the weight...

Phil's comment above seems worth addressing. My <1% figure was for an actual single AI fooming suddenly into a takes-over-the-world unfriendly thing, e.g., kills/enslaves us all. (Need I repeat that even a 1% risk is serious?)

Eli,

Over the last several years, your writing's become quite a bit more considered, insightful, and worth reading. I fear, though, that the information density has become, if anything, even less than it might have previously been. I really want to "hear" (i.e., read) what you have to say --- but just keeping up is a full-time job. (Fyi, this isn't any damning critique; I've had this criticism aimed at me before, too.)

So a plea: do you think you could find a way to say the very worthwhile things you have to say, perhaps more concisely? This is argument --- worthwhile argument --- not polemic and not poetry. Apply whatever compression algorithm you think appropriate, we'll manage...

I'll second jb's request for denser, more highly structured representations of Eliezer's insights. I read all this stuff, find it entertaining and sometimes edifying, but disappointing in that it's not converging on either a central thesis or central questions (preferably both.)

I think one way to sum up parts of what Eliezer is talking about in terms of AGI go FOOM is as follows:

If you think of Intelligence as Optimization and we assume you can build an AGI with optimization power near to or at human level (anything below would be too weak to affect anything, a human could do a better job) then we can use the following argument to show that AGI does go FOOM.

We already have proof that human level optimization power can produce near human level artificial intelligence (premise), so simply point it at an interesting optimization problem (itself) and recurse. As long as the number of additional improvements per improvement done on the AGI is greater than 1, FOOM will occur.

It should not get stuck at human level intelligence as human level is nowhere near as good as you can get.

Why wouldn't you point your AGI (using whatever techniques you have available) at itself? I can't think of any reasonable ones which wouldn't preclude you building the AGI in the first place.

Of course this means we need human level artificial general intelligence, but then it needs to be that to be anywhere near human level optimization power. I won't bother going over what happens when you have AI that is better at some of what humans do but not all, simply look around you right now.

"Or, to make it somewhat less strong, as if I woke up one morning to find that banks were charging negative interest on loans?"

They already have, at least for a short while.

http://www.nytimes.com/2008/12/10/business/10markets.html

The idea of making a mind-design n-space by putting various attributes on the axis, such as humorous/non-humorous, conceptual/perceptual/sensual, etc. -- how much does this tell us about the real possibilites?

What I mean is, for a thing to be possible, there must be some combination of atoms that can fit together to make it work. But merely making an N-space does not tell us about what atoms there are and what they can do.

Come to think of it, how can we assert anything is possible without having already designed it?

When someone designs a superintelligent AI (it won't be Eliezer), without paying any attention to Friendliness (the first person who does it won't), and the world doesn't end (it won't), it will be interesting to hear Eliezer's excuses.

Unknown, do you expect money to be worth anything to you in that situation? If so, I'll be happy to accept a $10 payment now in exchange for a $1000 inflation-adjusted payment in that scenario you describe.

I second JB's request regarding concise writing. Eliezer's posts invariably have at least one or two really insightful ideas, but it often takes a few thousand more words to make those points than it should.

I'd like to add to JB's and Peanut's points: for example, the 1000-word dialogue in Sustained Strong Recursion struck me as especially redundant, when the same could be communicated in a couple of clear formulas, or just referring to an elementary and well-known concept of compound interest.

Most of the variety of Eliezer's output is useful to some audience, but there's a serious problem of getting the right people to the right documents.

Eliezer, I am sending you the $10. I will let you know how to pay when you lose the bet. I have included in the envelope a means of identifying myself when I claim the money, so that it cannot be claimed by someone impersonating me.

Your overconfidence will surely cost you on this occasion, even though I must admit that I was forced to update (a very small amount) in favor of your position, on seeing the surprising fact that you were willing to engage in such a wager.

Unknown, where are you mailing it to?

Eliezer: c/o Singularity Institute P.O. Box 50182 Palo Alto, CA 94303 USA

I hope that works.

Eliezer: did you receive the $10? I don't want you making up the story, 20 or 30 years from now, when you lose the bet, that you never received the money.

Not yet. I'll inquire.

I have included in the envelope a means of identifying myself when I claim the money, so that it cannot be claimed by someone impersonating me.

Doesn't that technically make you now Known?

Also, how much time has to pass between an AI 'coming to' and the world ending? What constitutes an AI for this bet?

Eliezer, will you be donating the $10 to the Institute? If so, does this constitute using the wager to shift the odds in your favour, however slightly?

Yes, the last two are jokes. But the first two are serious.

Ben Jones, the means of identifying myself will only show that I am the same one who sent the $10, not who it is who sent it.

Eliezer seemed to think that one week would be sufficient for the AI to take over the world, so that seems enough time.

As for what constitutes the AI, since we don't have any measure of superhuman intelligence, it seems to me sufficient that it be clearly more intelligent than any human being.

I'll pay $1,000 if I manage to still be alive and relatively free several weeks after an unfriendly superintelligence is released. Purely as a philonthropic gesture of gratitude for my good fortune or relief that my expectations were unfounded.

Cameron, that's great but you've got to sell that payout to the highest bidder if you want to generate any info.

Unknown, I have received your $10 and the bet is on.

This is awesome.

What I wonder is how Unknown is gonna be able to prove to you that the designers weren't paying attention to Friendliness.

Expecting humor in novel minds, given that humans display it, isn't so unreasonable: convergently handy traits will be common, and if there are common traits we're likely to have a good portion of them. For instance, rhyme in poetry may seem idiosyncratic, but it makes it easier to remember a poem and the information contained within for creatures that use spreading-activation in memory (one word primes others that share auditory features with it).

CarlShuman, it seems to me that even the set of minds that would have received some fitness benefit from a sense of humor would be much smaller than the set of minds that actually have a sense of humor, since there are almost certainly lots of other ways to solve the kind of fitness problems that humor solves.

Also, the set of minds that would receive benefit from humor is way smaller yet than the whole of plausibly-biologically-evolved mindspace. For example, how would a non-social mind benefit from humor?