If you’re interested in being on the right side of disputes, you will refute your opponents' arguments. But if you're interested in producing truth, you will fix your opponents' arguments for them. To win, you must fight not only the creature you encounter; you [also] must fight the most horrible thing that can be constructed from its corpse.

-- Black Belt Bayesian

This is an informal post meant as a reply to a post by user:utilitymonster, 'What is the best compact formalization of the argument for AI risk from fast takeoff?'

I hope to find the mental strength to put more effort into it in future to improve it. But since nobody else seems to be willing to take a critical look at the overall topic I feel that doing what I can is better than doing nothing.

Please review the categories 'Further Reading' and 'Notes and References'.




In this post I just want to take a look at a few premises (P#) that need to be true simultaneously to make the SIAI a wortwhile charity from the point of view of someone trying to do as much good as possible by contributing money. I am going to show that the case of risks from AI is strongly conjunctive, that without a concrete and grounded understanding of AGI an abstract analysis of the issues is going to be very shaky, and that therefore SIAI is likely to be a bad choice as a charity. In other words, that which speaks in favor of SIAI does mainly consist of highly specific, conjunctive, non-evidence-backed speculations on possible bad outcomes.

Requirements for an Intelligence Explosion

P1 Fast, and therefore dangerous, recursive self-improvement is logically possible.

It took almost four hundred years to prove Fermat’s Last Theorem. The final proof is over a hundred pages long. Over a hundred pages! And we are not talking about something like an artificial general intelligence that can magically make itself smart enough to prove such theorems and many more that no human being would be capable of proving. Fermat’s Last Theorem simply states “no three positive integers a, b, and c can satisfy the equation a^n + b^n = c^n for any integer value of n greater than two.”

Even artificial intelligence researchers admit that "there could be non-linear complexity constrains meaning that even theoretically optimal algorithms experience strongly diminishing intelligence returns for additional compute power." [1] We just don't know.

Other possible problems include the impossibility of a stable utility function and a reflective decision theory, the intractability of real world expected utility maximization or that expected utility maximizers stumble over Pascal's mugging, among other things [2].

For an AI to be capable of recursive self-improvement it also has to guarantee that its goals will be preserved when it improves itself. It is still questionable if it is possible to conclusively prove that improvements to an agent's intelligence or decision procedures maximize expected utility. If this isn't possible it won't be rational or possible to undergo explosive self-improvement.

P1.b The fast computation of a simple algorithm is sufficient to outsmart and overpower humanity.

Imagine a group of 100 world-renowned scientists and military strategists.

  • The group is analogous to the initial resources of an AI.
  • The knowledge that the group has is analogous to what an AI could come up with by simply "thinking" about it given its current resources.

Could such a group easily wipe away the Roman empire when beamed back in time?

  • The Roman empire is analogous to our society today.

Even if you gave all of them a machine gun, the Romans would quickly adapt and the people from the future would run out of ammunition.

  • Machine guns are analogous to the supercomputer it runs on.

Consider that it takes a whole technological civilization to produce a modern smartphone.

You can't just say "with more processing power you can do more different things", that would be analogous to saying that "100 people" from today could just build more "machine guns". But they can't! They can't use all their knowledge and magic from the future to defeat the Roman empire.

A lot of assumptions have to turn out to be correct to make humans discover simple algorithms over night that can then be improved to self-improve explosively.

You can also compare this to the idea of a Babylonian mathematician discovering modern science and physics given that he would be uploaded into a supercomputer (a possibility that is in and of itself already highly speculative). It assumes that he could brute-force conceptual revolutions.

Even if he was given a detailed explanation of how his mind works and the resources to understand it, self-improving to achieve superhuman intelligence assumes that throwing resources at the problem of intelligence will magically allow him to pull improved algorithms from solution space as if they were signposted.

But unknown unknowns are not signposted. It's rather like finding a needle in a haystack. Evolution is great at doing that and assuming that one could speed up evolution considerably is another assumption about technological feasibility and real-world resources.

That conceptual revolutions are just a matter of computational resources is pure speculation.

If one were to speed up the whole Babylonian world and accelerate cultural evolution, obviously one would arrive quicker at some insights. But how much quicker? How much are many insights dependent on experiments, to yield empirical evidence, that can't be speed-up considerably? And what is the return? Is the payoff proportionally to the resources that are necessary?

If you were going to speed up a chimp brain a million times, would it quickly reach human-level intelligence? If not, why then would it be different for a human-level intelligence trying to reach transhuman intelligence? It seems like a nice idea when formulated in English, but would it work?

Being able to state that an AI could use some magic to take over the earth does not make it a serious possibility.

Magic has to be discovered, adapted and manufactured first. It doesn't just emerge out of nowhere from the computation of certain algorithms. It emerges from a society of agents with various different goals and heuristics like "Treating Rare Diseases in Cute Kittens". It is an evolutionary process that relies on massive amounts of real-world feedback and empirical experimentation. Assuming that all that can happen because some simple algorithm is being computed is like believing it will emerge 'out of nowhere', it is magical thinking.

Unknown unknowns are not sign-posted. [3]

If people like Benoît B. Mandelbrot would have never decided to research Fractals then many modern movies wouldn't be possible, as they rely on fractal landscape algorithms. Yet, at the time Benoît B. Mandelbrot conducted his research it was not foreseeable that his work would have any real-world applications.

Important discoveries are made because many routes with low or no expected utility are explored at the same time [4]. And to do so efficiently it takes random mutation, a whole society of minds, a lot of feedback and empirical experimentation.

"Treating rare diseases in cute kittens" might or might not provide genuine insights and open up new avenues for further research. As long as you don't try it you won't know.

The idea that a rigid consequentialist with simple values can think up insights and conceptual revolutions simply because it is instrumentally useful to do so is implausible.

Complex values are the cornerstone of diversity, which in turn enables creativity and drives the exploration of various conflicting routes. A singleton with a stable utility-function lacks the feedback provided by a society of minds and its cultural evolution.

You need to have various different agents with different utility-functions around to get the necessary diversity that can give rise to enough selection pressure. A "singleton" won't be able to predict the actions of new and improved versions of itself by just running sandboxed simulations. Not just because of logical uncertainty but also because it is computationally intractable to predict the real-world payoff of changes to its decision procedures.

You need complex values to give rise to the necessary drives to function in a complex world. You can't just tell an AI to protect itself. What would that even mean? What changes are illegitimate? What constitutes "self"? That are all unsolved problems that are just assumed to be solvable when talking about risks from AI.

An AI with simple values will simply lack the creativity, due to a lack of drives, to pursue the huge spectrum of research that a society of humans does pursue. Which will allow an AI to solve some well-defined narrow problems, but it will be unable to make use of the broad range of synergetic effects of cultural evolution. Cultural evolution is a result of the interaction of a wide range of utility-functions.

Yet even if we assume that there is one complete theory of general intelligence, once discovered, one just has to throw more resources at it. It might be able to incorporate all human knowledge, adapt it and find new patterns. But would it really be vastly superior to human society and their expert systems?

Can intelligence itself be improved apart from solving well-defined problems and making more accurate predictions on well-defined classes of problems? The discovery of unknown unknowns does not seem to be subject to other heuristics than natural selection. Without goals, well-defined goals, terms like "optimization" have no meaning.

P2 Fast, and therefore dangerous, recursive self-improvement is physically possible.

Even if it could be proven that explosive recursive self-improvement is logically possible, e.g. that there are no complexity constraints, the question remains if it is physically possible.

Our best theories about intelligence are highly abstract and their relation to real world human-level general intelligence is often wildly speculative [5][6].

P3 Fast, and therefore dangerous, recursive self-improvement is economically feasible.

To exemplify the problem take the science fictional idea of using antimatter as explosive for weapons. It is physically possible to produce antimatter and use it for large scale destruction. An equivalent of the Hiroshima atomic bomb will only take half a gram of antimatter. But it will take 2 billion years to produce that amount of antimatter [7].

We simply don’t know if intelligence is instrumental or quickly hits diminishing returns [8].

P3.b AGI is able to create (or acquire) resources, empowering technologies or civilisatory support [9].

We are already at a point where we have to build billion dollar chip manufacturing facilities to run our mobile phones. We need to build huge particle accelerators to obtain new insights into the nature of reality.

An AI would either have to rely on the help of a whole technological civilization or be in control of advanced nanotech assemblers.

And if an AI was to acquire the necessary resources on its own, its plan for world-domination would have to go unnoticed. This would require the workings of the AI to be opaque to its creators yet comprehensible to itself.

But an AI capable of efficient recursive self improvement must be able to

  1. comprehend its own workings
  2. predict how improvements, respectively improved versions of itself, are going to act to ensure that its values are preserved

So if the AI can do that, why wouldn't humans be able to use the same algorithms to predict what the initial AI is going to do? And if the AI can't do that, how is it going to maximize expected utility if it is unable to predict what it is going to do?

Any AI capable of efficient self-modification must be able to grasp its own workings and make predictions about improvements to various algorithms and its overall decision procedure. If an AI can do that, why would the humans who build it be unable to notice any malicious intentions? Why wouldn't the humans who created it not be able to use the same algorithms that the AI uses to predict what it will do? If humans are unable to predict what the AI will do, how is the AI able to predict what improved versions of itself will do?

And even if an AI was able to somehow acquire large amounts of money. It is not easy to use the money. You can't "just" build huge companies with fake identities, or a straw man, to create revolutionary technologies easily. Running companies with real people takes a lot of real-world knowledge, interactions and feedback. But most importantly, it takes a lot of time. An AI could not simply create a new Intel or Apple over a few years without its creators noticing anything.

The goals of an AI will be under scrutiny at any time. It seems very implausible that scientists, a company or the military are going to create an AI and then just let it run without bothering about its plans. An artificial agent is not a black box, like humans are, where one is only able to guess its real intentions.

A plan for world domination seems like something that can't be concealed from its creators. Lying is no option if your algorithms are open to inspection.

P4 Dangerous recursive self-improvement is the default outcome of the creation of artificial general intelligence.

Complex goals need complex optimization parameters (the design specifications of the subject of the optimization process against which it will measure its success of self-improvement).

Even the creation of paperclips is a much more complex goal than telling an AI to compute as many decimal digits of Pi as possible.

For an AGI, that was designed to design paperclips, to pose an existential risk, its creators would have to be capable enough to enable it to take over the universe on its own, yet forget, or fail to, define time, space and energy bounds as part of its optimization parameters. Therefore, given the large amount of restrictions that are inevitably part of any advanced general intelligence (AGI), the nonhazardous subset of all possible outcomes might be much larger than that where the AGI works perfectly yet fails to hold before it could wreak havoc.

And even given a rational utility maximizer. It is possible to maximize paperclips in a lot of different ways. How it does it is fundamentally dependent on its utility-function and how precisely it was defined.

If there are no constraints in the form of design and goal parameters then it can maximize paperclips in all sorts of ways that don't demand recursive self-improvement.

"Utility" does only become well-defined if we precisely define what it means to maximize it. Just maximizing paperclips doesn't define how quickly and how economically it is supposed to happen.

The problem is that "utility" has to be defined. To maximize expected utility does not imply certain actions, efficiency and economic behavior, or the drive to protect yourself. You can also rationally maximize paperclips without protecting yourself if it is not part of your goal parameters.

You can also assign utility to maximize paperclips as long as nothing turns you off but don't care about being turned off. If an AI is not explicitly programmed to care about it, then it won't.

Without well-defined goals in form of a precise utility-function, it might be impossible to maximize expected "utility". Concepts like "efficient", "economic" or "self-protection" all have a meaning that is inseparable with an agent's terminal goals. If you just tell it to maximize paperclips then this can be realized in an infinite number of ways that would all be rational given imprecise design and goal parameters. Undergoing to explosive recursive self-improvement, taking over the universe and filling it with paperclips, is just one outcome. Why would an arbitrary mind pulled from mind-design space care to do that? Why not just wait for paperclips to arise due to random fluctuations out of a state of chaos? That wouldn't be irrational. To have an AI take over the universe as fast as possible you would have to explicitly design it to do so.

But for the sake of a thought experiment assume that the default case was recursive self-improvement. Now imagine that a company like Apple wanted to build an AI that could answer every question (an Oracle).

If Apple was going to build an Oracle it would anticipate that other people would also want to ask it questions. Therefore it can't just waste all resources on looking for an inconsistency arising from the Peano axioms when asked to solve 1+1. It would not devote additional resources on answering those questions that are already known to be correct with a high probability. It wouldn't be economically useful to take over the universe to answer simple questions.

It would neither be rational to look for an inconsistency arising from the Peano axioms while solving 1+1. To answer questions an Oracle needs a good amount of general intelligence. And concluding that asking it to solve 1+1 implies to look for an inconsistency arising from the Peano axioms does not seem reasonable. It also does not seem reasonable to suspect that humans desire an answer to their questions to approach infinite certainty. Why would someone build such an Oracle in the first place?

A reasonable Oracle would quickly yield good solutions by trying to find answers within a reasonable time which are with a high probability just 2–3% away from the optimal solution. I don't think anyone would build an answering machine that throws the whole universe at the first sub-problem it encounters.

P5 The human development of artificial general intelligence will take place quickly.

What evidence do we have that there is some principle that, once discovered, allows us to grow superhuman intelligence overnight?

If the development of AGI takes place slowly, a gradual and controllable development, we might be able to learn from small-scale mistakes, or have enough time to develop friendly AI, while having to face other existential risks.

This might for example be the case if intelligence can not be captured by a discrete algorithm, or is modular, and therefore never allow us to reach a point where we can suddenly build the smartest thing ever that does just extend itself indefinitely.

Therefore the probability of an AI to undergo explosive recursive self-improvement (P(FOOM)) is the probability of the conjunction (P#P#) of its premises:

P(FOOM) = P(P1∧P2∧P3∧P4∧P5)

Of course, there are many more premises that need to be true in order to enable an AI to go FOOM, e.g. that each level of intelligence can effectively handle its own complexity, or that most AGI designs can somehow self-modify their way up to massive superhuman intelligence. But I believe that the above points are enough to show that the case for a hard takeoff is not disjunctive, but rather strongly conjunctive.

Requirements for SIAI to constitute an optimal charity

In this section I will assume the truth of all premises in the previous section.

P6 SIAI can solve friendly AI.

Say you believe that unfriendly AI will wipe us out with a probability of 60% and that there is another existential risk that will wipe us out with a probability of 10% even if unfriendly AI turns out to be no risk or in all possible worlds where it comes later. Both risks have the same utility x (if we don't assume that an unfriendly AI could also wipe out aliens etc.). Thus .6x > .1x. But if the probability of solving friendly AI = A to the probability of solving the second risk = B is A ≤ (1/6)B then the expected utility of mitigating friendly AI is at best equal to the other existential risk because .6Ax ≤ .1Bx.

Consider that one order of magnitude more utility could easily be outweighed or trumped by an underestimation of the complexity of friendly AI.

So how hard is it to solve friendly AI?

Take for example Pascal's mugging, if you can't solve it then you need to implement a hack that is largely based on human intuition. Therefore, in order to estimate the possibility of solving friendly AI one needs to account for the difficulty in solving all sub-problems.

Consider that we don't even know "how one would start to research the problem of getting a hypothetical AGI to recognize humans as distinguished beings." [10]

P7 SIAI does not increase risks from AI.

By trying to solve friendly AI, SIAI has to think about a lot of issues related to AI in general and might have to solve problems that will make it easier to create artificial general intelligence.

It is far from being clear that SIAI is able to protect its findings against intrusion, betrayal, industrial or espionage.

P8 SIAI does not increase negative utility.

There are several possibilities by which SIAI could actually cause a direct increase in negative utility.

1) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically [11].

2) Humans are not provably friendly. Given the power to shape the universe the SIAI might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies [12].

P9 It makes sense to support SIAI at this time [13].

Therefore the probability of SIAI to be a worthwhile charity (P(CHARITY)) is the probability of the conjunction (P#P#) of its premises:

P(CHARITY) = P(P6∧P7∧P8∧P9)

As before, there are many more premises that need to be true in order for SIAI to be the best choice for someone who wants to maximize doing good by contributing money to a charity.

Further Reading

The following posts and resources elaborate on many of the above points and hint at a lot of additional problems.

Notes and References

[1] Q&A with Shane Legg on risks from AI

[2] http://lukeprog.com/SaveTheWorld.html

[3] "In many ways, this is a book about hindsight. Pythagoras could not have imagined the uses to which his equation would be put (if, indeed, he ever came up with the equation himself in the first place). The same applies to almost all of the equations in this book. They were studied/discovered/developed by mathematicians and mathematical physicists who were investigating subjects that fascinated them deeply, not because they imagined that two hundred years later the work would lead to electric light bulbs or GPS or the internet, but rather because they were genuinely curious."

17 Equations that changed the world

[4] Here is my list of "really stupid, frivolous academic pursuits" that have lead to major scientific breakthroughs.

  • Studying monkey social behaviors and eating habits lead to insights into HIV (Radiolab: Patient Zero)
  • Research into how algae move toward light paved the way for optogenetics: using light to control brain cells (Nature 2010 Method of the Year).
  • Black hole research gave us WiFi (ICRAR award)
  • Optometry informs architecture and saved lives on 9/11 (APA Monitor)
  • Certain groups HATE SETI, but SETI's development of cloud-computing service SETI@HOME paved the way for citizen science and recent breakthroughs in protein folding (Popular Science)
  • Astronomers provide insights into medical imaging (TEDxBoston: Michell Borkin)
  • Basic physics experiments and the Fibonacci sequence help us understand plant growth and neuron development


[5] "AIXI is often quoted as a proof of concept that it is possible for a simple algorithm to improve itself to such an extent that it could in principle reach superhuman intelligence. AIXI proves that there is a general theory of intelligence. But there is a minor problem, AIXI is as far from real world human-level general intelligence as an abstract notion of a Turing machine with an infinite tape is from a supercomputer with the computational capacity of the human brain. An abstract notion of intelligence doesn’t get you anywhere in terms of real-world general intelligence. Just as you won’t be able to upload yourself to a non-biological substrate because you showed that in some abstract sense you can simulate every physical process."

Alexander Kruel, Why an Intelligence Explosion might be a Low-Priority Global Risk

[6] "…please bear in mind that the relation of Solomonoff induction and “Universal AI” to real-world general intelligence of any kind is also rather wildly speculative… This stuff is beautiful math, but does it really have anything to do with real-world intelligence? These theories have little to say about human intelligence, and they’re not directly useful as foundations for building AGI systems (though, admittedly, a handful of scientists are working on “scaling them down” to make them realistic; so far this only works for very simple toy problems, and it’s hard to see how to extend the approach broadly to yield anything near human-level AGI). And it’s not clear they will be applicable to future superintelligent minds either, as these minds may be best conceived using radically different concepts."

Ben Goertzel, 'Are Prediction and Reward Relevant to Superintelligences?'

[7] http://public.web.cern.ch/public/en/spotlight/SpotlightAandD-en.html

[8] "If any increase in intelligence is vastly outweighed by its computational cost and the expenditure of time needed to discover it then it might not be instrumental for a perfectly rational agent (such as an artificial general intelligence), as imagined by game theorists, to increase its intelligence as opposed to using its existing intelligence to pursue its terminal goals directly or to invest its given resources to acquire other means of self-improvement, e.g. more efficient sensors."

Alexander Kruel, Why an Intelligence Explosion might be a Low-Priority Global Risk

[9] Section 'Necessary resources for an intelligence explosion', Why an Intelligence Explosion might be a Low-Priority Global Risk, Alexander Kruel

[10] http://lesswrong.com/lw/3aa/friendly_ai_research_and_taskification/

[11] http://lesswrong.com/r/discussion/lw/ajm/ai_risk_and_opportunity_a_strategic_analysis/5ylx

[12] http://lesswrong.com/lw/8c3/qa_with_new_executive_director_of_singularity/5y77

[13] "I think that if you're aiming to develop knowledge that won't be useful until very very far in the future, you're probably wasting your time, if for no other reason than this: by the time your knowledge is relevant, someone will probably have developed a tool (such as a narrow AI) so much more efficient in generating this knowledge that it renders your work moot."

Holden Karnofsky in a conversation with Jaan Tallinn

New Comment
127 comments, sorted by Click to highlight new comments since: Today at 4:17 AM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Imagine a group of 100 world-renowned scientists and military strategists. Could such a group easily wipe away the Roman empire when beamed back in time?

Imagine a group of 530 Spaniards...

At the risk of confirming every negative stereotype RationalWiki and the like have of us...have you read the Sequences? I'm reluctant to write a full response to this, but I think large parts of the Sequences were written to address some of these ideas.

I'm afraid I had the same reaction. XiXiDu's post seems to take the "shotgun" approach of listing every thought that popped into XiXiDu's head, without applying much of a filter. It's exhausting to read. Or, as one person I know put it, "XiXiDu says a lot of random shit."

I understand what you're saying, but, speaking from a strictly nitpicky perspective, I don't think the situation is analogous. The Roman Empire had many more soldiers to throw at the problem; much more territory to manage; comparatively better technology; and, perhaps more importantly, a much more robust and diverse -- and therefore memetically resistant -- society. They would therefore fare much better than the Aztecs did.
Conquistadors climbed to the top of a volcano to harvest sulphur for ammunition production. You can count on uploads in our society, as on some Navy Seals sent into the Roman world, to do analog actions. They both would not just wait for the help from nowhere. They would improvise as conquistadors once did.
Understood, but there's only so much the conquistadors can do even with gunpowder. Guns can do a lot of damage against bronze swords and armor, but if they have more soldiers than you have bullets, then you'll still lose. Of course, if the conquistadors could build a modern tank, they'd be virtually invincible. But in order to do that, they'd need to smelt steel, vulcanize rubber, refine petroleum, manufacture electronics, etc. Even if they had perfect knowledge of these technologies, they couldn't duplicate them in ye olde Aztec times, because such technologies require a large portion of the world's population to be up to speed. There's a limit to how much you can do armed with nothing but a pocket knife and a volcano. I think this was XiXiDu's point: knowledge alone is not enough, you also need to put in a lot of work (which is often measured in centuries) in order to apply it.
Understood that, too! But one can optimize and outsource a lot. Conquistadors employed Indians, enslaved Aztecs and Incas. Besides, the subjective time of an upload can be vast. A good idea can trim a lot of work need to be done. And at least my upload would be full of ideas.
Agreed; just as a single conquistador -- or better yet, a modern engineer -- transported into the Roman Empire would be full of ideas. He would know how to forge steel, refine petroleum, design electronic circuits, genetically engineer plants and animals, write software, plus many other things. But he wouldn't be able to actually use most of that knowledge. In order to write software, you need a computer. In order to build a computer, you need... well, you need a lot of stuff that outsourced Aztec (or Roman) slaves just wouldn't be able to provide. You could enslave everyone on the continent, and you still wouldn't be able to make a single CPU. Sure, if you were patient, very lucky, and long-lived, you could probably get something going within the next century or so. But that's hardly a "FOOM", and the Romans would have a hundred years to stop you, if they decided that your plans for the future aren't to their liking.
Exactly. And here the parable breaks down. The upload just might have those centuries. Virtual subjective time of thousands of years to devise a cunning plan, before we the humans even discuss their advantage. Yudkowsky has wrote a short story about this. http://lesswrong.com/lw/qk/that_alien_message/
Bugmaster's point was that it takes a century of action by external parties, not a century of subjective thinking time. The timetable doesn't get advanced all that much by super-intelligence. Real-world changes happen on real-world timetables. And yes, the rate of change might be exponential, but exponential curves grow slowly at first. And meanwhile, other things are happening in that century that might upset the plans and that cannot be arbitrarily controlled even by super-intelligence.
Err... minor quibble. Exponential curves grow at the same rate all the time. That is, if you zoom in on the x^2 graph at any point at any scale, it will look exactly the same as it did before you zoomed in.
I think we are using "rate" in different ways. The absolute rate of change per unit time for an exponential is hardly constant; If you look at the segment of e^x near, say, e^10, it's growing much faster than it is at e^(-10).
asr got my point exactly right.
Guns? I thought horses were their main advantage. (What are the Aztecs gonna do, burn down all the grass in the continent?)
The OP used gunpowder as the example, so I went with it. You might be right about horses, though.
He's read them well enough to collect a fairly complete index of cherry picked Eliezer quotes to try to make him look bad. I don't think lack of exposure to prerequisite information is the problem here.
The index wedrifid was alluding to, if anyone cares: http://shityudkowskysays.tumblr.com/
I actually loved reading it. Some of those are up there among my favorite EY quotes. Arrogant, sometimes needing context to make them make sense and sometimes best left unsaid for practical reasons but still brilliant. For example: There is also a quote there that I agree should remain visible, to Eliezer's shame, until such time that he swallows his ego and publicly admits that it was an utterly idiotic way to behave. Then there is at least one quote which really deserves a disclaimer in a footnote - that EY has already written an entire sequence on admitting how stupid he was to think the way he thought when he wrote it! I was actually rather disappointed when the list only went for a page or two. I was looking forward to reading all the highlights and lowlights. He deserves at least a few hundred best of and worst of quotes!
There's always sorting in http://www.ibiblio.org/weidai/lesswrong_user.php?u=Eliezer_Yudkowsky
By following the link below the quote people could learn that he claims that he doesn't agree with what he wrote there anymore. But I added an extra disclaimer now.
-1Simon Fischer12y
Thanks for making me find out what the Roko-thing was about :(

P1 Fast, and therefore dangerous, recursive self-improvement is logically possible.

All your counter-arguments are enthymematic; as far as I can tell, you are actually arguing against a proposition which looks more like

P1 Recursive self-improvement of arbitrary programs towards unalterable goals is possible with very small constant factors and P or better general asymptotic complexity

I would find your enthymematic far more convincing if you explained why things like Goedel machines are either fallacious or irrelevant.

P1.b The fast computation of a simple algorithm is sufficient to outsmart and overpower humanity.

Your argument is basically an argument from fiction; it's funny that you chose that example of the Roman Empire when recently Reddit spawned a novel arguing that a Marine Corps (surely less dangerous than your 100) could do just that. I will note in passing that black powder's formulation is so simple and famous that even I, who prefers archery, knows it: saltpeter, charcoal, and sulfur. I know for certain that the latter two are available in the Roman empire and suspect the former would not be hard to get. EDIT: and this same day, a Mafia-related paper I was read... (read more)

I disagree with the gist of your comment, but I upvoted it because this quote made me LOL. That said, I don't think that XiXiDu is claiming that computers can't exhibit creativity, period. Rather, he's saying that the kind of computers that SIAI is envisioning can't exhibit creativity, because they are implicitly (and inadvertently) designed not to.
You are arguing past each-other. XiXiDu is saying that a programmer can create software that can be inspected reliably. We are very close to having provably-correct kernels and compilers, which would make it practical to build reliably sandboxed software, such that we can look inside the sandbox and see that the software data structures are what they ought to be. It is separately true that not all software can be reliably understood by static inspection, which is all that the underhanded C contest is demonstrating. I would stipulate that the same is true at run-time. But that's not the case here. Presumably developers of a large complicated AI will design it to be easy to debug -- I don't think they have much chance of a working program otherwise.
No, you are ignoring Xi's context. The claim is not about what a programmer on the team might do, it is about what the AI might write. Notice that the section starts 'The goals of an AI will be under scrutiny at any time...'
Yes. I thought Xi's claim was that if you have an AI and put it to work writing software, the programmers supervising the AI can look at the internal "motivations", "goals", and "planning" data structures and see what the AI is really doing. Obfuscation is beside the point.
I agree with you and XiXiDu that such observation should be possible in principle, but I also sort of agree with the detractors. You say, Oh, I'm sure they'd try. But have you ever seen a large software project ? There's usually mountains and mountains of code that runs in parallel on multiple nodes all over the place. Pieces of it are usually written with good intentions in mind; other pieces are written in a caffeine-fueled fog two days before the deadline, and peppered with years-old comments to the extent of, "TODO: fix this when I have more time". When the code breaks in some significant way, it's usually easier to write it from scratch than to debug the fault. And that's just enterprise software, which is orders of magnitude less complex than an AGI would be. So yes, it should be possible to write transparent and easily debuggable code in theory, but in practice, I predict that people would write code the usual way, instead.

It would be helpful if you summarised the premises in a short list. At the moment one has to do a lot of scrolling.

Edit: Actually, I think it would be a very good idea; their not being writen out together makes it easy to miss the fact that they're not all necessary, some imply others, and that they basically don't cut reality at its joints. You assert that these are all necessary and logically separate premises. Yet P4 is clearly not necessary for FOOM - something not being the default outcome does not mean it will not happen. P3 implies P2, and P2 implies P1. And P5 is clearly not necessary either - FOOM could occur in a thousand years time.

And again with the second set of premises - they are clearly not distinct, and not all necessary. For example,

  • P6 - SIAI will solve FAI

is not necessary; they might succeed by preventing anyone else from developing GAI.

  • P7 SIAI does not increase risks from AI.

If you mean net, then yes. But otherwise, it's perfectly possible that they might speed up UFAI and AI, and yet still be a good thing, if the latter outweighs the former.


  • P9 It makes sense to support SIAI at this time

is the conclusion of the argument! This premise alone is suf... (read more)

That is actually my argument against a lot of philosophy; arguments embedded in a lot of prose are unnecessarily hard to follow. Arguments, at least ones that you actually expect to be capable of changing someone's mind, should be presented as clearly and schematically as possible. Otherwise it looks a lot like "baffle them with bullshit."

Reading this made my brain hurt. It's a pile of false analogies that ignores the best arguments disagreeing with it, which is particularly ironic in light of the epigraph. (I'm thinking of Chalmers specifically, but really you can take your pick.)

I'm tempted to go through and point out every problem with this post, but I noticed at least a dozen on my first read-through and I just don't have the time.

Posts arguing against the LW orthodoxy deserve disproportional attention and consideration to combat groupthink, but this is just too wrong for me to tolerate.

But since nobody else seems to be willing to take a critical look at the overall topic

What I take a critical look at and what I write about in public are two very, very different things. Your audience is more heterogeneous than you might think.

You (XiXiDu) don't seem to think that intelligent machines are likely to be that big a deal. The switch to an engineered world is likely to be the biggest transformation of the planet since around the evolution of sex - or the last genetic takeover. It probably won't crash civilisation - or kill all humans - but it's going to be an enormous change, and the deatils of how it goes down could make a big difference to everyone. I do sometimes wonder whether you get that.

Good post.

You seem to excessively focus on recursive self-improvement to the exclusion of other hard takeoff scenarios, however. As Eliezer noted,

RSI is the biggest, most interesting, hardest-to-analyze, sharpest break-with-the-past contributing to the notion of a "hard takeoff" aka "AI go FOOM", but it's nowhere near being the only such factor. The advent of human intelligence was a discontinuity with the past even without RSI...

That post mentions several other hard takeoff scenarios, e.g.:

  • Even if an AI's self-improvement effort
... (read more)
Ah, thanks for making this point - I notice I've recently been treating "recursive self improvement and "hard takeoff" as more or less interchangeable concepts. I don't think I need to update on this, but I'll try and use my language more carefully at least.
Thanks. I will review those scenarios. Just some quick thoughts: On first sight this sounds suspicious. The genetic difference between a chimp and a human amounts to about ~40–45 million bases that are present in humans and missing from chimps. And that number is irrespective of the difference in gene expression between humans and chimps. So it's not like you're adding a tiny bit of code and get a superapish intelligence. The argument from the gap between chimpanzees and humans is interesting but can not be used to extrapolate onwards from human general intelligence. It is pure speculation that humans are not Turing complete and that there are levels above our own. That chimpanzees exist, and humans exist, is not a proof for the existence of anything that bears, in any relevant respect, the same relationship to a human that a human bears to a chimpanzee. Humans can process long chains of inferences with the help of tools. The important question is if incorporating those tools into some sort of self-perception, some sort of guiding agency, is vastly superior to humans using a combination of tools and expert systems. In other words, it is not clear that there does exist a class of problems that is solvable by Turing machines in general, but not by a combination of humans and expert systems. If an AI that we invented can hold a complex model in its mind, then we can also simulate such a model by making use of expert systems. Being consciously aware of the model doesn't make any great difference in principle to what you can do with the model. Here is what Greg Egan has to say about this in particular:
The quote from Egan would seem to imply that for (literate) humans, too, working memory differences are insignificant: anyone can just use pen and paper to increase their effective working memory. But human intelligence differences do seem to have a major impact on e.g. job performance and life outcomes (e.g. Gottfredson 1997), and human intelligence seems to be very closely linked to - though admittedly not identical with - working memory measures (e.g. Oberauer et al. 2005, Oberauer et al. 2008).
I believe that what he is suggesting is that if you reached a certain plateau then intelligence hits diminishing returns. Would Marilyn vos Savant be proportionally more likely to take over the world, if she tried to, than a 115 IQ individual? Some anecdotal evidence: Is there evidence that a higher IQ is useful beyond a certain level? The question is not just if it is useful but if it would be worth the effort it would take to amplify your intelligence to that point given that your goal was to overpower lower IQ agent's. Would a change in personality, more data, a new pair of sensors or some weapons maybe be more useful? If so, would an expected utility maximizer pursue intelligence amplification? (A marginal note, bigger is not necessarily better.)

I upvoted for the anecdote, but remember that you're referring to von Neumann, who invented both the basic architecture of computers and the self-replicating machine. I am not qualified to judge whether or not those are as original as relativity, but they are certainly big.

Sure. She's demonstrated that she can communicate successfully with millions and handle her own affairs quite successfully, generally winning at life. This is comparable to, say, Ronald Reagan's qualifications. I'd be quite unworried in asserting she'd be more likely to take over the world than a baseline 115 person.
Surely humans are Turing complete. I don't think anybody disputes that. We know that capabilities extend above our own in all the realms where machines already outstrip our capabilities - and we have a pretty good idea what greater speed, better memory and more memory would do.
Agree with your basic point, but a nit-pick: limited memory and speed (heat death of the universe, etc) put many neat Turing machine computations out of reach of humans (or other systems in our world) barring new physics.
Sure: I meant in the sense of the "colloquial usage" here:

Thanks for working this up.

However, it leads me to thinking about a modest FOOM. What's the least level of intelligence needed for a UFAI to be an existential risk? What's the least needed for it to be extremely deadly, even if not an existential risk?

What makes you think that it takes a general intelligence? Automatic scientists, with well-defined goals, that can brute-force discoveries on hard problem in bio and nanotech could enable unfriendly humans to wreck havoc and control large groups of people. If we survive that, which I think is the top risk rather than GAI, then we might at some point be able to come up with an universal artificial intelligence. Think about it this way. If humans figure out how to create some sort of advanced narrow AI that can solve certain problems in a superhuman way, why would they wait and not just assign it directly to solving those problems? The problem is that you can't make such narrow AI's "friendly", because they are tools and not agents. Tools used by unfriendly humans. Luckily there is a way to impede the consequences of that development and various existential risks at once. What we should be working on is a global sensor network by merging various technologies like artificial noses, lab on a chip technology, DNA Sequencing To Go etc. Such a sensor network could be used to detect various threats like nuclear terrorism with dirty bombs, venomed water or biological pathogens early on and alert authorities or nearby people. You could work with mobile phone companies to incorporate those sensors into their products. Companies like Apple would profit from having such sensors in their products by extending their capabilities. This would not only allow the mass production but would also spread the sensors randomly. You might also work together with the government who is always keen to get more information. All it would then take is an app! The analysis of the data could actually be done by the same gadgets that employ the sensors, a public computing grid. This isn't science fiction, it can actually be done. The technology is coming quickly. And best of all, it doesn't just protect us against various risks. Such sensors could be used to detect all kinds of health probl
I was with you up until this sentence. Really, we can make a global sensor network today ? A network that would detect all conceivable threats everywhere ? This sounds just a tad unrealistic to me, though not logically impossible at some point in the future.

Thanks - I've read the bullet points and it looks like a really good summary (apologies for skimming - I'll read it in more detail when I have time).

Just a few minor points:

  • The P(FOOM) calculation appears to be entirely independent of the P(CHARITY) calculation. Should these be made into separate documents? Or should it be made clearer which factors are common to FOOM and CHARITY? (e.g. P5 would appear to be correlated with P9).
  • In P6, I'm taking "SIAI" to mean a kind of generalized SIAI (i.e. it doesn't have to be this specific team of people
... (read more)
I wanted to show that even if you assign a high probability to the possibility of risks from AI due to recursive self-improvement, it is still questionable if SIAI is the right choice or if now is the time to act. As I wrote at the top, it was a rather quick write-up and I plan to improve it. I can't get myself to work on something like this for very long. It's stupid, I know. But I can try to improve things incrementally. Thanks for your feedback. That's a good point. SIAI as an organisation that makes people aware of the risk. But from my interview series it seemed like that a lot of AI researchers are aware of it to the point of being bothered. It isn't optimal. It is kind of hard to talk about premises that appear to be the same from a superficially point of view. But from a probabilistic point of view it is important to separate them into distinct parts to make clear that there are things that need to be true in conjunction. That problem is incredibly mathy and given my current level of education I am happy that people like Holden Karnofsky tackle that problem. The problem being that we get into the realm of Pascal's mugging here where vast utilities outweigh tiny probabilities. Large error bars may render such choices moot. For more, see my post here.

If I understand you correctly, you are saying this: "Don't bother with this superintelligence risk, for it is incredibly tiny."

A bold statement. Too bold for a potentially disastrous chain of events, which you assure us it's just impossible.

No, not really. I am not saying that because GiveWell says that the Against Malaria Foundation is the number #1 charity, treating children for parasite infections in sub-Saharan Africa should be ignored. This is a delicate problem and if it was up to me to allocate the resources of the world then existential risk researchers, including SIAI, would receive their share of funding. But if I could only choose one cause, either SIAI or something else, then I wouldn't choose SIAI. My opinion on the topic is highly volatile though. There have been moments when I thought that SIAI is best choice when it comes to charitable giving. There has been a time when I was completely sure that a technological singularity will happen soon. Maybe I will change my mind again. I suggest everyone to research the topic themselves.
This I agree. I won't buy the whole package from the SIAI, I won't even donate them under the current conditions. But I see some of their points as extremely important and I am happy that they exist and do what they do.

P(FOOM) = P(P1∧P2∧P3∧P4∧P5)

This is wrong because the premises aren't independent. It's actually this:

P(P1) P(P2|P1) P(P3|P2∧P1) P(P4|P3∧P2∧P1) P(P5|P4∧P3∧P2∧P1)

[This comment is no longer endorsed by its author]Reply

Not only there has to be UFAI risk, the FAI development must reduce the risk, which to me looks like the most shaky of the propositions. A buggy FAI that doesn't break itself somehow is for certain unfriendly (e.g. it can want to euthanize you to end your suffering, or to cut apart your brain into 2 hemispheres to satisfy each hemisphere's different desires, or something much more bizarre), while some random AI out of AI design space may e.g. typically wirehead everything except curiosity, and then it'd just keep us in a sort of wildlife preserve.

Note: tr... (read more)

Fantastic post. This sets a new standard for "SIAI skepticism". Dialectically it should be very useful as people try to rebut it at the same level of detail. I think you shouldn't mess with it too much now, as it may become a reference point.