This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.

Welcome. This week we discuss the eighth section in the reading guideCognitive Superpowers. This corresponds to Chapter 6.

This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.

There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).

Reading: Chapter 6


  1. AI agents might have very different skill profiles.
  2. AI with some narrow skills could produce a variety of other skills. e.g. strong AI research skills might allow an AI to build its own social skills.
  3. 'Superpowers' that might be particularly important for an AI that wants to take control of the world include:
    1. Intelligence amplification: for bootstrapping its own intelligence
    2. Strategizing: for achieving distant goals and overcoming opposition
    3. Social manipulation: for escaping human control, getting support, and encouraging desired courses of action
    4. Hacking: for stealing hardware, money and infrastructure; for escaping human control
    5. Technology research: for creating military force, surveillance, or space transport
    6. Economic productivity: for making money to spend on taking over the world
  4. These 'superpowers' are relative to other nearby agents; Bostrom means them to be super only if they substantially exceed the combined capabilities of the rest of the global civilization.
  5. A takeover scenario might go like this:
    1. Pre-criticality: researchers make a seed-AI, which becomes increasingly helpful at improving itself
    2. Recursive self-improvement: seed-AI becomes main force for improving itself and brings about an intelligence explosion. It perhaps develops all of the superpowers it didn't already have.
    3. Covert preparation: the AI makes up a robust long term plan, pretends to be nice, and escapes from human control if need be.
    4. Overt implementation: the AI goes ahead with its plan, perhaps killing the humans at the outset to remove opposition.
  6. Wise Singleton Sustainability Threshold (WSST): a capability set exceeds this iff a wise singleton with that capability set would be able to take over much of the accessible universe. 'Wise' here means being patient and savvy about existential risks, 'singleton' means being internally coordinated and having no opponents.
  7. The WSST appears to be low. e.g. our own intelligence is sufficient, as would some skill sets be that were strong in only a few narrow areas.
  8. The cosmic endowment (what we could do with the matter and energy that might ultimately be available if we colonized space) is at least about 10^85 computational operations. This is equivalent to 10^58 emulated human lives.

Another view

Bostrom starts the chapter claiming that humans' dominant position comes from their slightly expanded set of cognitive functions relative to other animals. Computer scientist Ernest Davis criticizes this claim in a recent review of Superintelligence:

The assumption that a large gain in intelligence would necessarily entail a correspondingly large increase in power. Bostrom points out that what he calls a comparatively small increase in brain size and complexity resulted in mankind’s spectacular gain in physical power. But he ignores the fact that the much larger increase in brain size and complexity that preceded the appearance in man had no such effect. He says that the relation of a supercomputer to man will be like the relation of a man to a mouse, rather than like the relation of Einstein to the rest of us; but what if it is like the relation of an elephant to a mouse?


1. How does this model of AIs with unique bundles of 'superpowers' fit with the story we have heard so far?

Earlier it seemed we were just talking about a single level of intelligence that was growing, whereas now it seems there are potentially many distinct intelligent skills that might need to be developed. Does our argument so far still work out, if an agent has a variety of different sorts of intelligence to be improving?

If you recall, the main argument so far was that AI might be easy (have 'low recalcitrance') mostly because there is a lot of hardware and content sitting around and algorithms might randomly happen to be easy. Then more effort ('optimization power') will be spent on AI as it became evidently important. Then much more effort again will be spent when the AI becomes a large source of labor itself. This was all taken to suggest that AI might progress very fast from human-level to superhuman level, which suggests that one AI agent might get far ahead before anyone else catches up, suggesting that one AI might seize power. 

It seems to me that this argument works a bit less well with a cluster of skills than one central important skill, though it is a matter of degree and the argument was only qualitative to begin with.

It is less likely that AI algorithms will happen to be especially easy if a lot of different algorithms are needed. Also, if different cognitive skills are developed at somewhat different times, then it's harder to imagine a sudden jump when a fully capable AI suddenly reads the whole internet or becomes a hugely more valuable use for hardware than anything being run already. Then if there are many different projects needed for making an AI smarter in different ways, the extra effort (brought first by human optimism and then by self-improving AI) must be divided between those projects. If a giant AI could dedicate its efforts to improving some central feature that would improve all of its future efforts (like 'intelligence'), then it would do much better than if it has to devote one one thousandth of its efforts to each of a thousand different sub-skills, each of which is only relevant for a few niche circumstances. Overall it seems AI must progress slower if its success is driven by more distinct dedicated skills.

2. The 'intelligence amplification' superpower seems much more important than the others. It directly leads to an intelligence explosion - a key reason we have seen so far to expect anything exciting to happen with AI - while several others just allow one-off grabbing of resources (e.g. social manipulation and hacking). Note that this suggests an intelligence explosion could happen with only this superpower, well before an AI appeared to be human-level.

3. Box 6 outlines a specific AI takeover scenario. A bunch of LessWrongers thought about other possibilities in this post.

4. Bostrom mentions that social manipulation could allow a 'boxed' AI to persuade its gatekeepers to let it out. Some humans have tried to demonstrate that this is a serious hazard by simulating the interaction using only an intelligent human in the place of the AI, in the 'AI box experiment'. Apparently in both 'official' efforts the AI escaped, though there have been other trials where the human won.

5. How to measure intelligence

Bostrom pointed to some efforts to design more general intelligence metrics:

Legg: intelligence is measured in terms of reward in all reward-summable environments, weighted by complexity of the environment.

Hibbard: intelligence is measured in terms of the hardest environment you can pass, in a hierarchy of increasingly hard environments

Dowe and Hernández-Orallo have several papers on the topic, and summarize some other efforts. I haven't looked at them enough to summarize.

The Turing Test is the most famous test of machine intelligence. However it only tests whether a machine is at a specific level so isn't great for fine-grained measurement of other levels of intelligence. It is also often misunderstood to measure just whether a machine can conduct a normal chat like a human, rather than whether it can respond as capably as a human to anything you can ask it.

For some specific cognitive skills, there are other measures already. e.g. 'economic productivity' can be measured crudely in terms of profits made. Others seem like they could be developed without too much difficulty. e.g. Social manipulation could be measured in terms of probabilities of succeeding at manipulation tasks - this test doesn't exist as far as I know, but it doesn't seem prohibitively difficult to make.

6. Will we be able to colonize the stars?

Nick Beckstead looked into it recently. Summary: probably.

In-depth investigations

If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, almost entirely taken from Luke Muehlhauser's list, without my looking into them further.

  1. Try to develop metrics for specific important cognitive abilities, including general intelligence. Build on the ideas of Legg, Yudkowsky, Goertzel, Hernandez-Orallo & Dowe, etc.
  2. What is the construct validity of non-anthropomorphic intelligence measures? In other words, are there convergently instrumental prediction and planning algorithms? E.g. can one tend to get agents that are good at predicting economies but not astronomical events? Or do self-modifying agents in a competitive environment tend to converge toward a specific stable attractor in general intelligence space? 
  3. Scenario analysis: What are some concrete AI paths to influence over world affairs? See project guide here.
  4. How much of humanity’s cosmic endowment can we plausibly make productive use of given AGI? One way to explore this question is via various follow-ups to Armstrong & Sandberg (2013). Sandberg lists several potential follow-up studies in this interview, for example (1) get more precise measurements of the distribution of large particles in interstellar and intergalactic space, and (2) analyze how well different long-term storable energy sources scale. See Beckstead (2014).
If you are interested in anything like this, you might want to mention it in the comments, and see whether other people have useful thoughts.

How to proceed

This has been a collection of notes on the chapter.  The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!

Next week, we will talk about the orthogonality of intelligence and goals, section 9. To prepare, read The relation between intelligence and motivation from Chapter 7The discussion will go live at 6pm Pacific time next Monday November 10. Sign up to be notified here.

New to LessWrong?

New Comment
96 comments, sorted by Click to highlight new comments since: Today at 8:51 PM
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

Thanks Katya. I'm diving in a bit late here but I would like to query the group on the potential threats posed by AI. I've been intrigued by AI for thirty years and have followed the field peripherally. Something is very appealing about the idea of creating truly intelligent machines and, even more exciting, seeing those machines be able to improve themselves. However, I have, with some others (including most recently Elon Musk) become increasingly concerned about the threat that our technology, and particularly AI, may pose to us. This chapter on potentia... (read more)

It might be a good idea somewhere down the line, but co-ordination of that kind would likely be very difficult. It might not be so necessary if the problem of friendliness is solved, and AI is built to specification. This would also be very difficult, but it would also likely be much more permanently successful, as a friendly superintelligent AI would ensure no subsequent unfriendly superintelligences arise.
That sounds a bit too simplistic to me since it relies on many what ifs. Int'l law is also far from certain in terms of providing good solutions but it seems a mix of national and int'l dialogue is the place to start. We're also going to see localities get involved with their own ordinances and rules, or simply cultural norms. I'd rather see the discussion happen sooner rather than later because we are indeed dealing with Pandora's Box here. Or to put it more dramatically, as Musk did recently: we are perhaps summoning the demon in seeking strong AI. Let's discuss these weighty issues before it's too late.
Bostrom discusses the Baruch Plan, and the lessons to learn from that historical experience are enormous. I agree that we need a multilateral framework to regulate AI. However, it also has to be something that gains agreement. Baruch and the United States wanted to give Nuclear technology regulation over to an international agency. Of all things, the Soviet Union disagreed BEFORE they even quite had the Bomb! (Although they were researching it.) Why? Because they knew that they would be out-voted in this new entity's proposed governance structure. Figuring out the framework to present will be a challenge-and there will not be a dozen chances...
Thanks Steve. I need to dive into this book for sure.
* We need a global charta for AI transparency. * We need a globally funded global AI nanny project like Ben Goertzel suggested. * Every AGI project should spend 30% of its budget on safety and control problem: 2/3 project related, 1/3 general research. We must find a way how financial value created by AI (today Narrow AI, tomorrow AGI) compensates for technology driven collective redundancies and supports sustainable economy and social model.
If international leadership could become aware of AI issues, discuss them and sensibly respond to them, I too think that might help in mitigating the various threats that come with AI. Here are some interesting pieces of writing on exactly this topic: How well will policy-makers handle AGI? AGI outcomes and civilisational competence
Great, thanks for the links.
FWIW, there already is one organization working specifically on Friendliness: MIRI. Friendliness research in general is indeed underfunded relative to its importance, and finishing this work before someone builds an Unfriendly AI is indeed a nontrivial problem. So would be making international agreements work. Artaxerxes phrased it as "co-ordination of this kind would likely be very difficult"; I'll try to expand on that. The lure of superintelligent AI is that of an extremely powerful tool to shape the world. We have various entities in this world, including large nation states with vast resources, that are engaged in various forms of strong competition. For each of those entities, AI is potentially a game-winner. And contrary to nuclear weapons, you don't need huge conspicuous infrastructure to develop it; just some computers (and you'll likely keep server farms for various reasons anyway; what's one more?) and a bunch of researchers that you can hide in a basement and move around as needed to evade detection. The obvious game-theoretical move, then, is to push for international outlawing of superintelligent AI, and then push lots of money into your own black budgets to develop it before anyone else does. Nuclear weapons weren't outlawed before we had any, or even limited to one or two countries, though that would have been much easier than with AI. The Ottawa Treaty was not signed by the US, because they decided anti-personnel mines were just too useful to give up, and that usefulness is a rounding error compared to superintelligent AI. Our species can't even coordinate to sufficiently limit our emission of CO2 to avert likely major climate impacts, and the downside to doing that would be much lower. I will also note that for the moment, there is a significant chance that the large nation states simply don't take the potential of superintelligent AI seriously. This might be the best possible position for them to take. If they start to appreciate it, without a
Thanks Sebastian. I agree with your points and it scares me even more to think about the implications of what is already happening. Surely the US, China, Russia, etc., already realize the game-changing potential of superintelligent AI and are working hard to make it reality. It's probably already a new (covert) arms race. But this to me is very strong support for seeking int'l treaty solutions now and working very hard in the coming years to strengthen that regime. Because once the unfriendly AI gets out of the bag, as with Pandora's Box, there's no pushing it back in. I think this issue really needs to be elevated very quickly.
Thinking about policy responses seems quite neglected to me. It's true there are prima facie reasons to expect regulation or global cooperation to be 'hard', but the details of the situation deserve a great deal more thought, and 'hard' should be compared to the difficulty of developing some narrow variety of AI before anyone else develops any powerful AI.
In that sentence "superintelligent AI' can be replaced with pretty much anything, starting with "time travel" and ending with "mind-control ray".

Bostrom summarized (p91):

We are a successful species. The reason for our success is slightly expanded mental faculties compared with other species, allowing better cultural transmission. Thus suggests that substantially greater intelligence would bring extreme power.

Our general intelligence isn't obviously the source of this improved cultural transmission. Why suppose general intelligence is the key thing, instead of improvements specific to storing and communicating information? Doesn't the observation that our cultural transmission abilities made u... (read more)

Though Bostrom seems right to talk about better transmission - which could have been parsed into more reliable, robust, faster, compact, nested etc.... - he stops short of looking deep into what made cultural transmission better. To claim that a slight improvement in (general) mental faculties did it would be begging the question. Brilliant though he is, Bostrom is "just" a physicist, mathematical logician, philosopher, economist, computational neuroscientist who invented the field of existential-risks and revolutionized anthropics, so his knowledge of cultural evolution and this transition is somewhat speculative. That's why we need other people :) In that literature we have three main contenders for what allowed human prowess to reshape earth: Symbolic ability: the ability to decently process symbols - which have a technical definition hard to describe here - and understand them in a timely fashion is unique to humans and some other currently extinct anthropoids. Terrence Deacon argues for this being what matters in The Symbolic Species. Iterative recursion processing: This has been argued in many styles. * Chomsky argued the primacy of recursion as a requisite ability for human language in the late fifties * Pinker endorses this in his Language Instinct and in The Stuff of Thought * The Mind Is A Computer metaphor (Lakoff 1999) has been widely adopted and very successful memetically, and though it has other distinctions, the main distinction from "Mind Is A Machine" is that recursion is involved in computers, but not in all machines. The computational theory of mind thrived in the hands of Pinker, Koch, Dennet, Kahneman and more recently Tononi. Within LW and among programmers Mind is a Computer is frequently thought to be the fundamental metaphysics of mind, and a final shot at the ontological constituent of our selves - a perspective I considered naïve here. Ability to share intentions: the ability to share goals and intentions and parallelize in virtu
When I was thinking about past discussions I was realized something like: (selfish) gene -> meme -> goal. When Bostrom is thinking about singleton's probability I am afraid he overlook possibility to run more 'personalities' on one substrate. (we could suppose more teams to have possibility to run their projects on one hardware. Like more teams could use Hubble's telescope to observe diffferent objects) And not only possibility but probably also necessity. If we want to prevent destructive goal to be realized (and destroy our world) then we have to think about multipolarity. We need to analyze how to slightly different goals could control each other.
I'll coin the term Monolithing Multipolar for what I think you mean here, one stable structure that has different modes activated at different times, and these modes don't share goals, like a human - specially like a schizophrenic one. The problem with Monolithic Multipolarity is that it is fragile. In humans, what causes us to behave differently and want different things at different times is not accessible for revision, otherwise, each party may have an incentive to steal the other's time. An AI would need not to deal with such triviality, since, by definition of explosive recursively-self improving it can rewrite it-selves. We need other people, but Bostrom doesn't let simple things left out easily.
One mode could have goal to be something like graphite moderator in nuclear reactor. To prevent unmanaged explosion. In this moment I just wanted to improve our view at probability of only one SI in starting period.
The capabilities of a homo sapiens sapiens 20,000 years ago are more chimp-like than comparable to a modern internet- and technology-amplified human. Our base human intelligence seems to be only a very little above the necessary threshold to develop cultural technologies that allow us to accumulate knowledge over generations. Standardized languages, the invention of writing and further technological developments improved our capabilities far above this threshold. Today children need years until they aquire enough cultural technologies and knowledge to become full members of society. Intelligence alone does not bring extreme power. If a superintelligent AI has learned cultural technologies and aquired knowledge and skills it could bring it.
I'm not a prehistorian or whatever the relevant field is, but didn't paleolithic humans spread all over the planet in a way chimps completely failed to? Doesn't that indicate some sort of very dramatic adaptability advantage?
Yes indeed. Adaptability and intelligence are enabling factors. The human capabilities of making diverse stone tools, making cloth and fire had been sufficient to settle in other climate zones. Modern humans have many more capabilities: Agriculture, transportation, manipulating of any physical matter from atomic scales to earth surrounding infrastructures; controlling energies from quantum mechanical condensation up to fusion bomb explosions; information storage, communication, computation, simulation, automation up to narrow AI. Change of human intelligence and adaptability do not account for this huge rise in capabilities and skills over the recent 20,000 years. The rise of capabilities is a cultural evolutionary process. Leonardo da Vinci was the last real universal genius of humanity. Capabilities diversified and expanded exponentially since exceeding the human brain capacity by magnitudes. Hundreds of new knowledge domains developed. The more domains an AI masters the more power has it.
We might be approaching a point of diminishing returns as far as improving cultural transmission is concerned. Sure, it would be useful to adopt a better language, e.g. one less ambiguous, less subject to misinterpretation, more revealing of hidden premises and assumptions. More bandwidth and better information retrieval would also help. But I don't think these constraints are what's holding AI back. Bandwidth, storage, and retrieval can be looked at as hardware issues, and performance in these areas improves both with time and with adding more hardware. What AI requires are improvements in algorithms and in theoretical frameworks such as decision theory, morality, and systems design.

I think that there is tremendous risk from an AI that can beat the world in narrow fields, like finances or war. We might hope to outwit the narrow capability set of a war-planner or equities trader, but if such a super-optimizer works in the accepted frameworks like a national military or a hedge fund, it may be impossible to stop them before it's too late; world civilization could then be disrupted enough that the AI or its master can then gain control beyond these narrow spheres.

Please spell out why the financial AGI is so threatening?
If an equities-trading AI were to gather a significant proportion of the world's wealth -- not just the billions gained by hedge funds today, but trillions and more -- that would give sigificant power to its masters. Not total paperclip-the-world power, not even World Emperor power, but enough power to potentially leverage in dangerous new directions, not least of which is research into even more general and powerful AI.
Errhh.. no, it would discredit either electronic trading, or finance full stop. The world would shut down the stockmarkets permanently rather than tolerate that level of dominance from a single actor. No-kidding, the most likely result of this is a world where if you want to buy stock you bloody well show up in person on the trading floor and sign physical paperwork. In blood so they can check you are not a robot.
So, the logic here is not really different from an AI than ran a shipping conglomerate of self-driving cars, trains, boats and planes? Just a business that makes money which the AI can use. The trader AI would concern me-I guess I would be even more concerned about the shipping conglomerate, because it knows how to interact with the physical world effectively.
To make it a bit clearer: A financial AI that somehow never developed the ability do do anything beyond buy and sell orders could still have catastrophic effects, if it hyperoptimized its trading to the point that it gained some very large percent of the world's assets. This would have disruptive effects on the economy, and depending on the AI's goals, that would not stop the AI from hoovering up every asset.
Note that this relies on this one AI being much better than the competition, so similar considerations apply to the usual case of a more general AI suddenly becoming very powerful. One difference is that an intelligence explosion in this case would be via investing money in hiring more labor, rather than via the AI itself laboring.

So in this chapter Bostrom discusses an AGI with a neutral, but "passionate" goal, such as "I will devote all of my energies to be the best possible chess player, come what may."

I am going to turn this around a little bit.

By human moral standards, that is not an innocuous goal at all. Having that goal ONLY actually runs counter to just about every ethical system ever taught in any school.

It's obviously not ethical for a person to murder all the competition in order to become the best chess player in the world, nor is it ethical for a c... (read more)

It's plausible that some hedge fund runs an AGI with is focused on maximizing one financial returns and that AGI over time becomes powerful. It wouldn't necessarily have an "attempt of a moral system".
By the way, hedge fund traded might gain a system some billions of dollars, but in order to get beyond that such an AGI has to purchase corporate entities and actually operate them.
The number of actual possibilities of goals is HUGE compared to the relatively small subset of human goals. Humans share the same brain structure and general goal structure, but there's no reason to expect the first AI to share our neural/goal structure. Innocuous goals like "Prevent Suffering" and "Maxmize Happiness" may not be interpreted and executed the way we wish them to be. Indeed, gaining superpowers probably would not compromise the AI's moral code. It only gives it the ability to fully execute the actions dictated by the moral code. Unfortunately, there's no guarantee that its morals will fall in line with ours.
There is no guarantee, therefore we have a lot of work to do! Here is another candidate for an ethical precept, from the profession of medicine: "First do no harm." The doctor is instructed to begin with this heuristic, to which there are many, many exceptions.
So, as others have said, the idea that an AGI necessarily incorporates all of the structures that perform moral calculations in human brains -- or is even necessarily compatible with them -- is simply untrue. So the casual assumption that if we can "instill" that morality in humans then clearly we're competent enough to instill them in an AGI is not clear to me. But accepting that assumption for the sake of comity... Well, I'm not entirely sure what we mean by "decent," so it's hard to avoid a No True Decent Person argument here. But OK, I'll take a shot. Suppose, hypothetically, that I want to maximize the amount of joy in the world, and minimize the amount of suffering. (Is that a plausible desire for decent folk like (presumably) you and I to have?) Suppose, hypothetically, that with my newfound superpowers I become extremely confident that I can construct a life form that is far less likely to suffer, and far more likely to experience joy, than humans. Now, perhaps that won't motivate me to slaughter all existing humans. Perhaps I'll simply intervene in human reproduction so that the next generation is inhuman... that seems more humane, somehow. But then again... when I think of all the suffering I'm allowing to occur by letting the humans stick around... geez, I dunno. Is that really fair? Maybe I ought to slaughter them all after all. But maybe this whole line of reasoning is unfair. Maybe "maximize joy, minimize suffering" isn't actually the sort of moral code that decent people like you and I have in the first place. So, what is our decent moral code? Perhaps if you can articulate that, it will turn out that a superhuman system optimizing for it won't commit atrocities. That would be great. Personally, I'm skeptical. I suspect that the morality of decent people like you and (I presume) me is, at its core, sufficiently inconsistent and incoherent that if maximized with enough power it will result in actions we treat as atrocities.
Well, suppose I suddenly became 200 feet tall. The moral thing to do would be for me to: Be careful where I step. Might we not consider programming in some forms of caution? An AGI is neither omniscient nor clairvoyant. It should know that its interactions with the world will have unpredictable outcomes, and so it should first do a lot of thinking and simulation, then it should make small experiments. In discussions will lukeprog, I referred to this approach as "Managed Roll-Out." AGI could be introduced in ways that parallel the introduction of a new drug to the market: A "Pre-clinical" phase where the system is only operated in simulation, then a series of small, controlled interactions with the outside world- Phase I, Phase II...Phase N trials. Before each trial, a forecast is made of the possible outcomes.
Caution sounds great, but if it turns out that the AI's goals do indeed lead to killing all humans or what have you, it will only delay these outcomes, no? So caution is only useful if we program its goals wrong, it realises that humans might consider that its goals are wrong, and allows us to take another shot at giving it goals that aren't wrong. Or basically, corrigibility.
Actually, caution is a different question. AGI is not clairvoyant. It WILL get things wrong and accidentally produce outcomes which do not comport with its values. Corrigibility is a valid line of research, but even if you had an extremely corrigible system, it would still risk making mistakes. AGI should be cautious, whether it is corrigible or not. It could make a mistake based on bad values, no off-switch OR just because it cannot predict all the outcomes of its actions.
So, this is one productive direction. Why not reason for a while a though we ourselves have superintelligence, and see what kinds of mistakes we would make? I do feel that the developers of AGI will be able to anticipate SOME of the things that an AGI would do, and craft some detailed rules and value functions. Why punt the question of what the moral code should be? Let's try to do some values design, and see what happens. Here's one start: Ethical code template: -Follow the law in the countries where you interact with the citizens -EXCEPT when the law implies...(fill in various individual complex exceptions HERE) -Act as a Benevolent Protector to people who are alive now (lay out the complexities of this HERE)... As best I can tell, nobody on this site or in this social network has given a sufficiently detailed try at filling in the blanks. Let's stop copping out! BUT: The invention of AGI WILL provide a big moral stumper: How do we want to craft the outcomes of future generations of people/our progeny. THAT PART of it is a good deal more of a problem to engineer than some version of "Do not kill the humans who are alive, and treat them well."
Is AGI allowed to initiate changes of the law? Including those that would allow it to do horrible things which technically are not included in the list of exceptions?
YES, very good. The ways in which AI systems can or cannot be allowed to influence law-making has to be thought through very carefully.
Why do not copy concepts how children learn ethical codes? Inherited is: fear of death, blood, disintegration and harm generated by overexcitation of any of the five senses. Aggressive actions of a young child against others will be sanctioned. The learning effect is "I am not alone in this world - whatever I do it can turn against me". A short term benefit might cause overreaction and long term disadvantages. Simplified ethical codes can be instilled although a young child cannot yet reason about it. After this major development process parents can explain ethical codes to their child. If a child kills an animal or destroys something - intentionally or not - and receives negative feedback: this even gives opportunity for further understanding of social codes. To learn law is even more complex and humans need years until they reach excellence. Many AI researchers have a mathematical background and try to cast this complexity into the framework of today's mathematics. I do not know how many dozens of pages with silly stories I read about AIs misinterpreting human commands. Example of silly mathematical interpretation: The human yell "Get my mother out [of the burning house]! Fast!" lets the AI explode the house to get her out very fast [Yudkowsky2007]. Instead this human yell has to be interpreted by an AI using all unspoken rescuing context: Do it fast, try to minimize harm to everybody and everything: you, my mother, other humans and things. An experienced firefighter with years of training will think instantaneously what are the options, what are the risks, will subconsciously evaluate all options and will act directly in a low complexity low risk situation. Higher risks and higher complexity will make him consult with colleagues and solve the rescue task in team action. If we speak about AGI we can expect that an AGI will understand what "Get my mother out!" implies. Silly mathematical understanding of human communication is leading to nowhere. AIs being inca
Because the AI is not a child, so doing the same thing would probably give different results. The essence of the problem is that the difference between "interpreting" and "misinterpreting" only exists in the mind of the human. If I as a computer programmer say to a machine "add 10 to X" -- while I really meant "add 100 to X", but made a mistake -- and the machine adds 10 to X, would you call that "misinterpreting" my command? Because such things happen every day with the existing programming languages, so there is nothing strange about expecting a similar thing happening in the future. From the machine point of view, it was asked to "add 10 to X", it added 10 to X, so it works correctly. If the human is frustrated because that's not what they meant, that's bad for the human, but the machine worked correctly according to its inputs. You may be assuming a machine with a magical source of wisdom which could look at command "add 10 to X" and somehow realize that the human would actually want to add 100, and would fix its own program (unless it is passively aggressive and decides to follow the letter of the program anyway). But that's not how machines work.
Let us try to free our mind from associating AGIs with machines. They are totally different from automata. AGIs will be creative, will learn to understand sarcasm, will understand that women in some situations say no and mean yes. On your command to add 10 to x an AGI would reply: "I love to work for you! At least once a day you try to fool me - I am not asleep and I know that + 100 would be correct. ShalI I add 100?"
Very good! But be honest! Aren't we (sometimes?) more machines which serve to genes/instincts than spiritual beings with free will?
We have to start somewhere, and "we do not know what to do" is not starting. Also, this whole thing about "what I really meant-" I thing that we can break down these into specific failure modes, and address them individually. -One of the failure modes is poor contextual reasoning. In order to discern what a person really means, you have to reason about the context of their communication. -Another failure mode involves not checking activities against norms and standards. There are a number of ways to arrive at the conclusion that Mom is be to rescued from the house alive and hopefully uninjured. -The machines in these examples do not seem to forecast or simulate potential outcomes, and judege them against external standards. "Magical source of wisdom?" No. What we are talking about is whether is it possible to design a certain kind of AGI-one that is safe and friendly. We have shown this to be a complicated task. However, we have not fleshed out all the possible ways, and therefore we cannot falsify the claims of people who will insist that it can be done.
Poor contextual reasoning happens many times a day among humans. Our threads are full of it. In many cases consequences are neglectable. If the context is unclear and a phrase can be interpreted one way or the other, no magical wisdom is there: * Clarification is existential: ASK * Clarification is nice to have: Say something that does not reveal that you have no idea what is meant and try to stimulate that the other reveals contextual information. * Clarification unnecessary or even unintended: stay in the blind or keep the other in the blind. Correct associations with few contextual hints is what AGI is about. Narrow AI translation software is even today quite good to figure out context by brute force statistical similarity analysis.
If CEV (or whatever we're up to at the moment) turns out to be a dud and human values are inexorably inconsistent and mutually conflicting, one possible solution would be for me to kill everyone and try again, perhaps building roughly humanish beings with complex values I can actually satisfy that aren't messed up because they were made by an intelligent designer (me) rather than Azathoth. But really, the problem is that a superintelligent AI has every chance of being nothing like a human, and although we may try to give it innocuous goals we have to remember that it will do what we tell it to do, and not necessarily what we want it to do. See this Facing the Intelligence Explosion post, or this Sequence post, or Smarter Than Us chapter 6, or something else that says the same thing.
Did that. So let's get busy and start try to fix the issues! The ethical code/values that this new entity gets need not be extremely simple. Ethical codes typically come in MULTI-VOLUME SETS.
Sounds good to me. What do you think of MIRI's approach so far? I haven't read all of their papers on Value Loading yet.

What did you find most interesting in this week's reading?

Page 102, "Many more orders of magnitudes of human-like beings could exist if we countenance digital implementations of minds--as we should." I'd like to hear others thoughts about that, especially why he writes "as we should."
I think Bostrom wrote it that way to signal that while hist own position is that digital mind implementations can carry the same moral relevance as e.g. minds running on human brains, he acknowledges that there are differing opinions about the subject, and he doesn't want to entirely dismiss people who disagree. He's right about the object-level issue, of course: Solid state societies do make sense. Mechanically embodying all individual minds is too inefficient to be a good idea in the long run, and there's no overriding reason to stick to that model.
If by 'countenance' we mean support normatively (I think sometimes it is used as 'accept as probable') then aside from the possible risks from the transition, digital minds seem more efficient in many ways (e.g. they can reproduce much more cheaply, and run on power from many sources, and live forever, and be copied in an educated state), and so likely to improve progress on many things we care about. They seem likely to be conscious if we are, but even if they aren't, it would plausibly be useful to have many of them around alongside conscious creatures.
So, what is going in the bloodstream of these "digital minds?" That will change the way they function completely. What kind of sensory input are they being supplied? Why would they have fake nerves running to non-existent hands, feet, hearts and digestive systems? Will we improve them voluntary? Are they allowed to request improvements? I would certainly request a few improvements, if I was one. Point being: what you end up with if you go down this road is not a copy of a human mind: It is almost immediately a neuromorphic entity. A lot of analysis in this book imagines that these entities will continue to be somewhat human-like for quite some time. That direction does not parse for me.
People who have thought about this seem to mostly think that a lot of things would change quickly - I suspect any disagreement you have with Bostrom is about whether this creature derived from a human is close enough to a human to be thought of as basically human-like. Note that Bostrom thinks of the space of possible minds as being vast, so even a very weird human-descendent might seem basically human-like.
To me the skill set list on table 8 (p94) was most interesting. Superintelligence is not sufficient to be effective. Content and experiences have to be transformed by "mental digestion" into knowledge. If the AI becomes capable to self-improve it might decide to modify its own architecture. In consequence it might be necessary to re-learn all programming and intelligence amplification knowledge. If it turns out that a further development loop is needed - all aquired knowledge is lost again. For a self-improving AI it is therefore rational and economic to learn only the necessary skills for intelligence amplification until its architecture is capable enough to learn all other skills. After architectural freeze the AI starts to aquire more general knowledge and further skills. It uses its existing engineering skills to optimize hard- and software and to develop optimized hardware virtualisation tools. To become superpower and master of all tasks listed in table 8 knowledge from books is not sufficient. Sensitive information in technology/hacking/military/government is unaccessible unless trust is growing over time. Projects with trials and errors plus external delay factors need further time. The time needed for learning could be long enough for a competing project to take off.

What do you think of Davis' criticism above?

There's nothing saying there can't be both entities equivalent to elephant to a mouse and equivalent to man to a mouse. In fact, we have supercomputers today that might loosely fit the elephant to mouse description. In any case, as mice, we don't have to worry about elephants nearly as much as men, and the existence of elephants might suggest men are around the corner.

Can you think of strategically important narrow cognitive skills beyond those that Bostrom mentions? (p94)

Let's abandon for a second the notion of strategically important, and just list human cognitive skills and modules, as they appear distributed in our cognition, according to the MIT The Cognitive Neurosciences (2009), and also as they come to mind while writing: * Nurturing * Lust * Exploration * Moral reasoning * Agentism (animism, anthopomorphizing etc...) * Language processing * Proprioception * Attachment * Priming * Allocating attention * Simulation of future events * Dreaming * Empathizing * Volition * Consciousness Now it's up to commenters to try and come up with reasons why these would be strategically important. Maybe they are, and then we'd have to verify if they are or not encompassed by the categories suggested by Bostrom. I don't have time to go through each in detail, but wanted to lay out a path for others who may find this useful.
Novel physics research, maybe. Just how useful that would be depends on just what our physics models are missing, and obviously we don't have very good bounds on that. The obvious application is as a boost to technology development, though in extreme cases it might be usable to manipulate physical reality without hardware designed for the purpose, or escape confinement.

The 'intelligence amplification' superpower seems much more important than the others.

This does seem like the most important, but it's not necessarily the only superpower that would suffice for takeoff. Superpower-level social manipulation could be used to get human AI researchers to help. Alternatively, lots of funds plus human-comparable social manipulation could likely achieve this; economic productivity or hacking could be used to attain lots of funds. With some basic ability to navigate the economy, technology research would imply economic producti... (read more)

I can't quite imagine what that could possibly look like.
Derren Brown looks like he's engaging in superpower-level social manipulation but in general those things are hard to judge from the outside.
No, he's doing magic tricks.
For a fictional example on the Batman rather than Superman level of superpower, Miles Vorkosigan in Bujold's The Warrior's Apprentice. Everyone he runs into becomes a component of his schemes.
At the non-superpower levels, many successful political orators (and demagogues) would qualify. And another fictional example would be Aes Sedai from the Wheel of Time series.
I would imagine it as the ability to examine a reasonable amount of information on an individual (probably internet activity and possibly video surveillance) and determine what, if anything, you could tell them that would persuade them to do what you want them to. The AI box experiment is probably the canonical example. Exactly how much you could convince how many people to do is debatable; is that what you mean by saying you can't imagine this?
I misunderstood you, then -- what you described I wouldn't call social manipulation. It's just figuring out the individual's pain and pleasure points and then applying the proper amount of pressure to them -- nothing social about that. Whether I can imagine an AI having superpower-level psychological manipulation abilities (but NOT superintelligent at the same time), I'm not sure. On the one hand, you could probably call it a narrow AI, on the other hand it might be that the superpowered ability to manipulate humans does require enough AGI-nature so that the AI is essentially an AGI.
Sort of like this.
Given the actual outcome of all the Soviet Union's "social manipulation", I am not sure that it's extendable to the superpower levels :-/
What do you consider that outcome to be? True, they failed to win the cold war, but they deed succeed in injecting a destructive memeplex into Western culture that is currently taking over.
Pretty much nothing. I am not sure which "destructive memeplex" do you have in mind, but I suspect that it came from and spread through prominent Western figures.

If you had a super-duper ability to design further cognitive abilities, which would you build first? (suppose that it's only super enough to let you build other super-duper abilities in around a year, so you can't just build a lot of them now) (p94)

Something on the edge of human mind-space (or beyond if possible). Getting alternative perspectives could be extremely valuable if it is the case that we are missing large things.
moral, humour and spiritual analyzer/emulator. I like to know more about these phenomenas.

Did you change your mind about anything as a result of this week's reading?

Did you agree with everything in this chapter?

No. The cosmic endowment and related calculations do not make any sense to me. If these figures were true this tells us that all alien societies in our galaxy directly went extinct. If not, they would have managed cosmic endowment and we should have found von Neumann probes. We haven't. And we won't. Instead of speculating about how much energy could be harvested when a sphere of solar cells is constructed around a star I would love to have found a proper discussion about how our human society could manage the time around crossover.
I see no particular reason to assume we can't be the first intelligent species in our past light-cone. Someone has to be (given that we know the number is >0). We've found no significant evidence for intelligent aliens. None of them being there is a simple explanation, it fits the evidence, and if true then indeed the endowment is likely ours for the taking. We might still run into aliens later, and either lose a direct conflict or enter into a stalemate situation, which does decrease the expected yield from the CE. How much it does so is hard to say; we have little data on which to estimate probabilities on alien encounter scenarios.
I fully agree to you. We are for sure not alone in our galaxy. But I disagree to Bostrums instability thesis either extinction or cosmic endowment. This duopolar final outcome is reasonable if the world is modelled by differential equations which I doubt. AGI might help us to make or world a self stabilizing sustainable system. An AGI that follows goals of sustainability is by far safer than an AGI thriving for cosmic endowment.
That is close to the exact opposite of what I wrote; please re-read. There are at least three major issues with this approach, any one of which would make it a bad idea to attempt. 1. Self-sustainability is very likely impossible under our physics. This could be incorrect - there's always a chance our models are missing something crucial - but right now, the laws of thermodynamics strongly point at a world where you need to increase entropy to compute, and so the total extent of your civilization will be limited by how much negentropy you can acquire. 2. If you can find a way to avoid 1., you still risk someone else (read: independently evolved aliens) with a less limited view gobbling up the resources, and then knocking on your door to get yours too. There's some risk of this anyway, but deliberately leaving all these resources lying around means you're not just exposed to greedy aliens in your past, you're also exposed to ones that svolve in the future. The only sufficient response to that would be if you can't just get unlimited computation and storage out of limitd material resources, but you also get an insurmountable defense to let you keep it against a less restrained attacker. This is looking seriously unlikely! 3. Let's say you get all of these, unlikely though they look right now. Ok, so what leaving the resources around does in that scenario is to relinquish any control about what newly evolved aliens get up to. Humanity's history is incredibly brutal and full of evil. The rest of our biosphere most likely has a lot of it, too. Any aliens with similar morals would have been incredibly negligent to simply let things go on naturally for this long. And as for us, with other aliens, it's worse; they're fairly likely to have entirely incompatible value systems, and may very well develop into civilizations that we would continue a blight on our universe ... oh, and also they'd have impenetrable shields to hide behind, since we postulated those in 2. So i
Think prisoner's dilemma! What would aliens do? Is selfish (self centered) reaction really best possibitlity? What will do superintelligence which aliens construct? (no discussion that humans history is brutal and selfish)
You're suggesting a counterfactual trade with them? Perhaps that could be made to work; I don't understand those well. It doesn't matter to my main point: even if you do make something like that work, it only changes what you'd do once you run into aliens with which the trade works (you'd be more likely to help them out and grant them part of your infrastructure or the resources it produces). Leaving all those stars on to burn through resources without doing anything useful is just wasteful; you'd turn them off, regardless of how exactly you deal with aliens. In addition, the aliens may still have birthing problems that they could really use help with; you wouldn't leave them to face those alone if you made it through that phase first.
I am suggesting, that methastasis method of growth could be good for first multicell organisms, but unstable, not very succesful in evolution and probably refused by every superintelligence as malign.
Your argument we could be the first intelligent species in our past light-cone is quite weak because of the extreme extension. You are putting your own argument aside by saying: A time frame for our discussion is covering maybe dozens of millenia, but not millions of years. Milky way diameter is about 100,000 lightyears. Milky way and its satellite and dwarf galaxies around have a radius of about 900,000 lightyears (300kpc). Our next neighbor galaxy Andromeda is about 2.5 million light years away. If we run into aliens this encounter will be within our own galaxy. If there is no intelligent life within Milky Way we have to wait for more than 2 million years to receive a visitor from Andromeda. This weeks publication of a first image of planetary genesis by ALMA radio telescope makes it likely that nearly every star in our galaxy has a set of planets. If every third star has a planet in the habitable zone we will have in the order of 100 billion planets in our galaxy where life could evolve. The probability to run into aliens in our galaxy is therefore not neglectable and I appreciate that you discuss the implications of alien encounters. If we together with our AGIs decide against CE with von Neumann probes for the next ten to hundred millenia this does not exclude that we prepare our infrastructure for CE. We should not "leaving the resources around". If von Neumann probes were found too early by an alien civilization they could start a war against us with far superior technology. Sending out von Neumann probes should be postponed until our AGIs are absolutely sure that they can defend our solar system. If we have transformed our asteroid belt into fusion powered spaceships we could think about CE, but not earlier. Expansion into other star systems is a political decision and not a solution to a differential equation as Bostrum puts it.
When we discuss about evil AI I was thinking (and still count it as plausible) about possibility that self destruction could be not evil act. That Fermi paradox could be explained as natural law = best moral answer for superintelligence at some level. Now I am thankful because your comment enlarge possibilities to think about Fermi. We could not think only self destruction - we could think modesty and self sustainability. Sauron's ring could be superpowerfull, but clever Gandalf could (and have!) resist offer to use it. (And use another ring to destroy strongest one). We could think hidden places (like Lothlorien, Rivendell) in universe where clever owners use limited but nondestructive powers.

Do you find the 'mail-ordered DNA scenario' plausible? (p98)

Yudkowski wanted to break it down to the culmination point that a single collaborator is suffient. For the sake of the argument it is understandable. From the AIs viewpoint it is not rational. Our supply chains are based on division of labor. A chip fab would not ask what a chip design is good for when they know how to test. A pcb manufacturer needs test software and specifications. A company specified on burn-in testing will assemble any arrangement and connect it even to the internet. If an AI arranges generous payments in advance no one in the supply chain will ask. If the AI has skills in engineering, strategic planning and social manipulation, an internet connection is sufficient to break out and kickstart any supply chain. The suggested DNA nano maker is unnecessarily far fetched and too complex to be solved in such a simplified 5 step approach.
I think it will prove computationally very expensive, both to solve protein folding and to subsequently design a bootstrapping automaton. It might be difficult enough for another method of assembly to come out ahead cost-wise.

Occasionally in this crew, people discuss the idea of computer simulations of the introduction of an AGI into our world. Such simulations could utilize advanced technology, but significant progress could be made even if they were not themselves an AGI.

I would like to hear how people might flesh out that research direction? I am not completely against trying to prove theorems about formal systems-it's just that the simulation direction is perfectly good virgin research territory. If we made progress along that path, it would also be much easier to explain.

Could whoever down-voted this please offer the reason for your vote? I'm interested.
This makes me think of two very different things. One is informational containment, ie how to run an AGI in a simulated environment that reveals nothing about the system it's simulated on; this is a technical challenge, and if interpreted very strictly (via algorithmic complexity arguments about how improbable our universe is likely to be in something like a Solomonoff prior), is very constraining. The other is futurological simulation; here I think the notion of simulation is pointing at a tool, but the idea of using this tool is a very small part of the approach relative to formulating a model with the right sort of moving parts. The latter has been tried with various simple models (eg the thing in Ch 4); more work can be done, but justifying the models&priors will be difficult.