AI Risk and Opportunity: A Strategic Analysis

Suppose you buy the argument that humanity faces both the risk of AI-caused extinction and the opportunity to shape an AI-built utopia. What should we do about that? As Wei Dai asks, "In what direction should we nudge the future, to maximize the chances and impact of a positive intelligence explosion?"

This post serves as a table of contents and an introduction for an ongoing strategic analysis of AI risk and opportunity.

Contents:

  1. Introduction (this post)
  2. Humanity's Efforts So Far
  3. A Timeline of Early Ideas and Arguments
  4. Questions We Want Answered
  5. Strategic Analysis Via Probability Tree
  6. Intelligence Amplification and Friendly AI
  7. ...


Why discuss AI safety strategy?

The main reason to discuss AI safety strategy is, of course, to draw on a wide spectrum of human expertise and processing power to clarify our understanding of the factors at play and the expected value of particular interventions we could invest in: raising awareness of safety concerns, forming a Friendly AI team, differential technological development, investigating AGI confinement methods, and others.

Discussing AI safety strategy is also a challenging exercise in applied rationality. The relevant issues are complex and uncertain, but we need to take advantage of the fact that rationality is faster than science: we can't "try" a bunch of intelligence explosions and see which one works best. We'll have to predict in advance how the future will develop and what we can do about it.


Core readings

Before engaging with this series, I recommend you read at least the following articles:


Example questions

Which strategic questions would we like to answer? Muehlhauser (2011) elaborates on the following questions:

  • What methods can we use to predict technological development?
  • Which kinds of differential technological development should we encourage, and how?
  • Which open problems are safe to discuss, and which are potentially dangerous?
  • What can we do to reduce the risk of an AI arms race?
  • What can we do to raise the "sanity waterline," and how much will this help?
  • What can we do to attract more funding, support, and research to x-risk reduction and to specific sub-problems of successful Singularity navigation?
  • Which interventions should we prioritize?
  • How should x-risk reducers and AI safety researchers interact with governments and corporations?
  • How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?
  • How does AI risk compare to other existential risks?
  • Which problems do we need to solve, and which ones can we have an AI solve?
  • How can we develop microeconomic models of WBEs and self-improving systems?
  • How can we be sure a Friendly AI development team will be altruistic?

Salamon & Muehlhauser (2013) list several other questions gathered from the participants of a workshop following Singularity Summit 2011, including:

  • How hard is it to create Friendly AI?
  • What is the strength of feedback from neuroscience to AI rather than brain emulation?
  • Is there a safe way to do uploads, where they don't turn into neuromorphic AI?
  • How possible is it to do FAI research on a seastead?
  • How much must we spend on security when developing a Friendly AI team?
  • What's the best way to recruit talent toward working on AI risks?
  • How difficult is stabilizing the world so we can work on Friendly AI slowly?
  • How hard will a takeoff be?
  • What is the value of strategy vs. object-level progress toward a positive Singularity?
  • How feasible is Oracle AI?
  • Can we convert environmentalists into people concerned with existential risk?
  • Is there no such thing as bad publicity [for AI risk reduction] purposes?

These are the kinds of questions we will be tackling in this series of posts for Less Wrong Discussion, in order to improve our predictions about which direction we can nudge the future to maximize the chances of a positive intelligence explosion.

161 comments, sorted by
magical algorithm
Highlighting new comments since Today at 4:44 AM
Select new highlight date

I suggest adding some more meta questions to the list.

  • What improvements can we make to the way we go about answering strategy questions? For example, should we differentiate between "strategic insights" (such as Carl Shulman's insight that WBE-based Singletons may be feasible) and "keeping track of the big picture" (forming the overall strategy and updating it based on new insights and evidence), and aim to have people specialize in each, so that people deciding strategy won't be tempted to overweigh their own insights? Another example: is there a better way to combine probability estimates from multiple people?
  • How do people in other fields answer strategy questions? Is there such a thing as a science or art of strategy that we can copy from (and perhaps improve upon with ideas from x-rationality)?
  • Should the subject be called "AI safety strategies" or "Singularity strategies"? (I prefer the latter.)

Selective opinion and answers (for longer discussions, respond to specific points and I'll furnish more details):

Which kinds of differential technological development should we encourage, and how?

I recommend pushing for whole brain emulations, with scanning-first and emphasis on fully uploading actual humans. Also, military development of AI should be prioritised over commercial and academic development, if possible.

Which open problems are safe to discuss, and which are potentially dangerous?

Seeing what has already been published, I see little advantage to restricting discussion of most open problems.

What can we do to reduce the risk of an AI arms race?

Any methods that would reduce traditional arms races. Cross ownership of stocks in commercial companies. Investment funds with specific AI disclosure requirements. Rewards for publishing interim results.

What can we do to raise the "sanity waterline," and how much will this help?

Individual sanity waterline raising among researchers useful, but generally we want to raise the sanity waterline of institutions, which is harder but more important (and may have nothing to do with improving individuals).

Which interventions should we prioritize?

We need a solid push to see if reduced impact or Oracle AIs can work, and we need to make the academic and business worlds to take the risks more seriously. Interventions to stop the construction of dangerous AIs unlikely to succeed, but "working with your company to make your AIs safer (and offering useful advice along the way)" could work. We need to develop useful tools we can offer others, not solely nagging them all the time.

How should x-risk reducers and AI safety researchers interact with governments and corporations?

Beggars can't be choosers. For the moment, we need to make them take it seriously, convince them, and give away any safety-increasing info we might have. Later we may have to pursue different courses.

How can optimal philanthropists get the most x-risk reduction for their philanthropic buck?

Funding SIAI and FHI and similar, getting us in contact with policy makers, raising the respectability of xrisks.

How does AI risk compare to other existential risks?

Very different; no other xrisk has such uncertain probabilities and timelines, and such huge risks and rewards and various scenarios that can play out.

Which problems do we need to solve, and which ones can we have an AI solve?

We need to survive till AI, and survive AI. If we survive, most trends are positive, so don't need to worry about much else.

How can we develop microeconomic models of WBEs and self-improving systems?

With thought and research :-)

How can we be sure a Friendly AI development team will be altruistic?

Do it ourselves, normalise altruistic behaviour in the field, or make it in their self-interest to be altruistic.

How hard is it to create Friendly AI?

Probably extraordinarily hard if the FAI is as intelligent as we fear. More work needs to be done to explore partial solutions (limited impact, Oracle, etc...)

Is there a safe way to do uploads, where they don't turn into neuromorphic AI?

Keep them as human (in their interactions, in their virtual realities, in their identities etc...) as possible.

How possible is it to do FAI research on a seastead?

How is this relevant? If governments were so concerned about AI potential that the location of the research became important, then we would have made tremendous progress in getting people to take it seriously, and AI will most likely not be developed by a small seasteading independent group.

How much must we spend on security when developing a Friendly AI team?

We'll see at the time.

What's the best way to recruit talent toward working on AI risks?

General: get people involved as a problem to be worked on, socialise them into our world, get them to care. AI researchers: conferences and publications and getting more respectable publicity.

How difficult is stabilizing the world so we can work on Friendly AI slowly?

Very.

How hard will a takeoff be?

Little useful data. Use scenario planning rather than probability estimates.

What is the value of strategy vs. object-level progress toward a positive Singularity?

Both needed, both need to be closely connected, easy shifts from one to the other. Possibly should be more strategy at the current time.

How feasible is Oracle AI?

As yet unknown. Research progressing, based on past performance I expect new insights to arrive.

Can we convert environmentalists into people concerned with existential risk?

With difficulty for AI risks, with ease for some others (extreme global warming). Would this be useful? Smaller more tightly focused pressure groups would preform much better, even if less influence.

Is there no such thing as bad publicity [for AI risk reduction] purposes?

Anything that makes it seem more like an area for cranks is bad publicity.

What are your most important disagreements with other FHI/SIAI people? How do you account for these disagreements?

You say:

I recommend pushing for whole brain emulations

but also:

We need a solid push to see if reduced impact or Oracle AIs can work

which makes me a bit confused. Are you saying we should push them simultaneously, or what? Also, what path do you see from a successful Oracle AI to a positive Singularity? For example, use Oracle AI to develop WBE technology, then use WBEs to create FAI? Or something else?

What are your most important disagreements with other FHI/SIAI people? How do you account for these disagreements?

Main disagreement with FHI people is that I'm more worried about AI than they are (I'm probably up with the SIAI folks on this). I suspect an anchoring effect here - I was drawn to the FHI's work through AI risk, others were drawn in through other angles (also I spend much more time on Less Wrong, making AI risks very salient). Not sure what this means for accuracy, so my considered opinion is that AI is less risky than I individually believe.

Are you saying we should push them simultaneously, or what?

My main disagreement with SIAI is that I think FAI is unlikely to be implementable on time. So I want to explore alternative avenues, several ones ideally. Oracle to FAI would be one route; Oracle to people taking AI seriously to FAI might be another. WBE opens up many other avenues (including "no AI"), so is also worth looking into.

I haven't bothered to try and close the gap between me and SIAI on this, because even if they are correct, I think it's valuable for the group to have someone looking into non-FAI avenues.

Thanks for the answers. The main problem I have with Oracle AI is that it seems a short step from OAI to UFAI, but a long path to FAI (since you still need to solve ethics and it's hard to see how OAI helps with that), so it seems dangerous to push for it, unless you do it in secret and can keep it secret. Do you agree? If so, I'm not sure how "Oracle to people taking AI seriously to FAI" is supposed to work.

My main "pressure point" is pushing UFAI development towards OAI. ie I don't advocate building OAI, but making sure that the first AGIs will be OAIs. And I'm using far too many acronyms.

What does it matter that the first AGIs will be OAIs, if UFAIs follow immediately after? I mean, once knowledge of how to build OAIs start to spread, how are you going to make sure that nobody fails to properly contain their Oracles, or intentionally modifies them into AGIs that act on their own initiatives? (This recent post of mine might better explain where I'm coming from, if you haven't already read it.)

We can already think productively about how to win if oracle AIs come first. Paul Christiano is working on this right now, see the "formal instructions" posts on his blog. Things are still vague but I think we have a viable attack here.

Wot cousin_it said.

Of course the model "OAIs are extremely dangerous if not properly contained; let's let everyone have one!" isn't going to work. But there are many things we can try with an OAI (building a FAI, for instance), and most importantly, some of these things will be experimental (the FAI approach relies on getting the theory right, with no opportunity to test it). And there is a window that doesn't exist with a genie - a window where people realise superintelligence is possible and where we might be able to get them to take safety seriously (and they're not all dead). We might also be able to get exotica like a limited impact AI or something like that, if we can find safe ways of experimenting with OAIs.

And there seems no drawback to pushing an UFAI project into becoming an OAI project.

Cousin_it's link is interesting, but it doesn't seem to have anything to do with OAI, and instead looks like a possible method of directly building an FAI.

Of course the model "OAIs are extremely dangerous if not properly contained; let's let everyone have one!" isn't going to work.

Hmm, maybe I'm underestimating the amount of time it would take for OAI knowledge to spread, especially if the first OAI project is a military one (on the other hand, the military and their contractors don't seem to be having better luck with network security than anyone else). How long do you expect the window of opportunity (i.e., the time from the first successful OAI to the first UFAI, assuming no FAI gets built in the mean time) to be?

some of these things will be experimental

I'd like to have FAI researchers determine what kind of experiments they want to do (if any, after doing appropriate benefit/risk analysis), which probably depends on the specific FAI approach they intend to use, and then build limited AIs (or non-AI constructs) to do the experiments. Building general Oracles that can answer arbitrary (or a wide range of) questions seems unnecessarily dangerous for this purpose, and may not help anyway depending on the FAI approach.

And there seems no drawback to pushing an UFAI project into becoming an OAI project.

There may be, if the right thing to do is to instead push them to not build an AGI at all.

One important fact I haven't been mentioning: OAI help tremendously with medium speed takeoffs (fast takeoffs are dangerous for the usual reasons, slow takeoffs mean that we will have moved beyond OAIs by the time the intelligence level hits dangerous), because we can then use them to experiment.

There may be, if the right thing to do is to instead push them to not build an AGI at all.

Interacting with AGI people at the moment (organising a jointish conference), will have a clearer idea of how they react to these ideas at a later stage.

slow takeoffs mean that we will have moved beyond OAIs by the time the intelligence level hits dangerous

Moved where/how? Slow takeoff means we have more time, but I don't see how it changes the nature of the problem. Low time to WBE makes (not particularly plausible) slow takeoff similar to the (moderately likely) failure to develop AGI before WBE.

Together with Wei's point that OAI doesn't seem to help much, there is the downside that existence of OAI safety guidelines might make it harder to argue against pushing AGI in general. So on net it's plausible that this might be a bad idea, which argues for weighing this tradeoff more carefully.

there is the downside that existence of OAI safety guidelines might make it harder to argue against pushing AGI in general.

Possibly. But in my experience even getting the AGI people to admit that there might be safety issues is over 90% of the battle.

It's useful for AGI researchers to notice that there are safety issues, but not useful for them to notice that there are "safety issues" which can be dealt with by following OAI guidelines. The latter kind of understanding might be worse than none at all, as it seemingly resolves the problem. So it's not clear to me that getting people to "admit that there might be safety issues" is in itself a worthwhile milestone.

My main disagreement with SIAI is that I think FAI is unlikely to be implementable on time.

Why do you say this is a disagreement? Who at SIAI thinks FAI is likely to be implementable on time (and why)?

So I want to explore alternative avenues, several ones ideally.

Right, assuming we can find any alternative avenues of comparable probability of success. I think it's unlikely for FAI to be implementable both "on time" (i.e. by humans in current society), and via alternative avenues (of which fast WBE humans seems the most plausible one, which argues for late WBE that's not hardware-limited, not pushing it now). This makes current research as valuable as alternative routes despite improbability of current research's success.

Why do you say this is a disagreement? Who at SIAI thinks FAI is likely to be implementable on time (and why)?

Let me rephrase: I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work, but more likely to be implementable. I think SIAI disagrees with this accessment.

I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work, but more likely to be implementable.

I assume that by "implementable" you mean that it's an actionable project, that might fail to "work", i.e. deliver the intended result. I don't see how "implementability" is a relevant characteristic. What matters is whether something works, i.e. succeeds. If you think that other methods are less likely to work, how are they of greater expected value? I probably parsed some of your terms incorrectly.

Whether the project reached the desired goal, versus whether that goal will actually work. If Nick and Eliezer both agreed about some design that "this is how you build a FAI", then I expect it will work. However, I don't think it's likely that would happen. It's more likely they will say "this is how you build a proper Oracle AI", but less likely the Oracle will end up being safe.

Whether the project reached the desired goal, versus whether that goal will actually work.

Okay, but I still don't understand how a project with lower probability of "actually working" can be of higher expected value. I'm referring to this statement:

I think the expected gain from pursuing FAI is less that pursuing other methods. Other methods are less likely to work...

The argument you seem to be giving in support of higher expected value of other methods is that they are "more likely to be implementable" (a project reaching its stated goal, even if that goal turns out to be no good), but I don't see how is that an interesting property.

He didn't say other architectures would be no good, he said they're less likely to be safe.

He thinks the distribution P(Outcome | do(complete Oracle AI project)) isn't as highly peaked at Weirdtopia as P(outcome | do(complete FAI)); Oracle AI puts more weight on regions like "Lifeless universe", "Eternal Torture", "Rainbows and Slow Death", and "Failed Utopia".

However, "Complete FAI" isn't an actionable procedure, so he examines the chance of completion conditional on different actions he can take. "Not worth pursuing because non-implementable" means that available FAI supporting actions don't have a reasonable chance of producing friendly AI, which discounts the peak in the conditional outcome distribution at valuable futures relative to do(complete FAI). And supposedly he has some other available oracle AI supporting strategy which fares better.

Eating a sandwich isn't as cool as building an interstellar society with wormholes for transportation, but I'm still going to make a sandwich for lunch, because it's going to work and maybe be okay-ish.

Main disagreement with FHI people is that I'm more worried about AI than they are (I'm probably up with the SIAI folks on this).

Where can we read FHI's analysis of AI risk? Why are they not as worried as you and SIAI people? Has there ever been a debate between FHI and SIAI on this? What threats are they most worried about? What technologies do they want to push or slow down?

What threats are they most worried about?

AI is high on the list - one of the top risks, even if their objective assessment is lower than SIAI. Nuclear war, synthetic biology, nanotech, pandemics, social collapse: these are the other ones we're looking it.

Basically they don't buy the "AI inevitably goes foom and inevitably takes over". They see definite probabilities of these happening, but their estimates are closer to 50% than to 100%.

They estimate it at 50%???
And there are other things they are more concerned about?
What are those other things?

They estimate a variety of of conditional statements ("AI possible this century", "if AI then FOOM", "if FOOM then DOOM", etc...) with magnitudes between 20% and 80% (I had the figures somewhere, but can't find them). I think when it was all multiplied out it was in the 10-20% range.

And I didn't say they thought other things were more worrying; just that AI wasn't the single overwhelming risk/reward factor that SIAI (and me) believe it to be.

A wild guess. FHI believes that the best what can reasonably be done about existential risks at this point in time is to do research into existential risks, including possible unknown unknowns, and into strategies to reduce current existential risks. This somewhat agrees with their FAQ:

Research into existential risk and analysis of potential countermeasures is a very strong candidate for being the currently most cost-effective way to reduce existential risk. This includes research into some methodological problems and into certain strategic questions that pertain to existential risk. Similarly, actions that contribute indirectly to producing more high-quality analysis on existential risk and a capacity later to act on the result of such analysis could also be extremely cost-effective. This includes, for example, donating money to existential risk research, supporting organizations and networks that engage in fundraising for existential risks work, and promoting wider awareness of the topic and its importance.

In other words, FHI seems to focus on meta issues, existential risks in general, rather than associated specifics.

Link: In this ongoing thread, Wei Dai and I discuss the merits of pre-WBE vs. post-WBE decision theory/FAI research.

"In what direction should we nudge the future, to maximize the chances and impact of a positive Singularity?"

Friendly AI is incredible hard to get right and a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically.

I vote for antinatalism. It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering. Friendly AI is simply too risky.

I think that humans are not psychological equal. Not only are there many outliers, but most humans would turn into abhorrent creatures given their own pocket universe, unlimited power and a genie. And even given our current world, if we were to remove the huge memeplex of western civilization, most people would act like stone age hunter-gatherer. And that would be bad enough. After all, violence is the major cause of death within stone age socities.

Even proposals like CEV (Coherent Extrapolated Volition) can turn out to be a living hell for a percentage of all beings. I don't expect any amount of knowledge, or intelligence, to cause humans to abandon their horrible preferences.

Eliezer Yudkowsky says that intelligence does not imply benevolence. That an artificial general intelligence won't turn out to be friendly. That we have to make it friendly. Yet his best proposal is that humanity will do what is right if we only knew more, thought faster, were more the people we wished we were and had grown up farther together. The idea is that knowledge and intelligence implies benevolence for people. I don't think so.

The problem is that if you extrapolate chaotic systems, e.g. human preferences given real world influence, small differences in initial conditions are going to yield widely diverging outcomes. That our extrapolated volition converges rather than diverges seems to be a bold prediction.

I just don't see that a paperclip maximizer burning the cosmic commons is as bad as it is currently portrayed. Sure, it is "bad". But everything else might be much worse.

Here is a question for those who think that antinatalism is just stupid. Would you be willing to rerun the history of the universe to obtain the current state? Would you be willing to create another Genghis Khan, a new holocaust, allowing intelligent life to evolve?

As Greg Egan wrote: "To get from micro-organisms to intelligent life this way would involve an immense amount of suffering, with billions of sentient creatures living, struggling and dying along the way."

If you are not willing to do that, then why are you willing to do the same now, just for much longer, by trying to colonize the universe? Are you so sure that the time to come will be much better? How sure are you?

ETA

I expect any friendly AI outcome that fails to be friendly in a certain way to increase negative utility and only a perfectly "friendly" (whatever that means, it is still questionable if the whole idea makes sense) AI to yield a positive utility outcome.

That is because the closer any given AGI design is to friendliness the more likely it is that humans will be kept alive but might suffer. Whereas an unfriendly AI in complete ignorance of human values will more likely just see humans as a material resource without having any particular incentive to keep humans around.

Just imagine a friendly AI which fails to "understand" or care about human boredom.

There are several possibilities by which SIAI could actually cause a direct increase in negative utility.

1) Friendly AI is incredible hard and complex. Complex systems can fail in complex ways. Agents that are an effect of evolution have complex values. To satisfy complex values you need to meet complex circumstances. Therefore any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways. A half-baked, not quite friendly, AI might create a living hell for the rest of time, increasing negative utility dramatically.

2) Humans are not provably friendly. Given the power to shape the universe the SIAI might fail to act altruistic and deliberately implement an AI with selfish motives or horrible strategies.

a friendly AI that is not quite friendly could create a living hell for the rest of time, increasing negative utility dramatically

"Ladies and gentlemen, I believe this machine could create a living hell for the rest of time..."

(audience yawns, people look at their watches)

"...increasing negative utility dramatically!"

(shocked gasps, audience riots)

Do you actually disagree with anything or are you just trying to ridicule it? Do you think that the possibility that FAI research might increase negative utility is not to be taken seriously? Do you think that world states where faulty FAI designs are implemented have on average higher utility than world states where nobody is alive? If so, what research could I possible do to come to the same conclusion? What arguments do I miss? Do I just have to think about it longer?

Consider the way Eliezer Yudkowsky agrues in favor of FAI research:

Two hundred million years from now, the children’s children’s children of humanity in their galaxy-civilizations, are unlikely to look back and say, “You know, in retrospect, it really would have been worth not colonizing the Herculus supercluster if only we could have saved 80% of species instead of 20%”. I don’t think they’ll spend much time fretting about it at all, really. It is really incredibly hard to make the consequentialist utilitarian case here, as opposed to the warm-fuzzies case.

or

This is crunch time. This is crunch time for the entire human species. … and it’s crunch time not just for us, it’s crunch time for the intergalactic civilization whose existence depends on us. I think that if you’re actually just going to sort of confront it, rationally, full-on, then you can’t really justify trading off any part of that intergalactic civilization for any intrinsic thing that you could get nowadays …

Is his style of argumentation any different from mine except that he promises lots of positive utility?

I was just amused by the anticlimacticness of the quoted sentence (or maybe by how it would be anticlimactic anywhere else but here), the way it explains why a living hell for the rest of time is a bad thing by associating it with something so abstract as a dramatic increase in negative utility. That's all I meant by that.

It should be seriously considered to create a true paperclip maximizer that transforms the universe into an inanimate state devoid of suffering.

Have you considered the many ways something like that could go wrong?

  • The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it
  • PM decides there's a chance that it's in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them
  • PM is itself capable of suffering
  • PM decides to create lots of descendent AIs in order to maximize paperclip production and they happen to be capable of suffering. (Our genes made us to maximize copies of them and we happen to be capable of suffering.)
  • somebody steals PM's source code before it's launched, and makes a sadistic AI

From your perspective, wouldn't it be better to just build a really big bomb and blow up Earth? Or alternatively, if you want to minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you'd have to solve a lot of the same problems as FAI.

Have you considered the many ways something like that could go wrong? [...] From your perspective, wouldn't it be better to [...] minimize suffering throughout the universe and maybe throughout the multiverse (e.g., by acausal negotiation with superintelligences in other universes), instead of just our corner of the world, you'd have to solve a lot of the same problems as FAI.

The reason for why I think that working towards FAI might be a bad idea is that it increases the chance of something going horrible wrong.

If I was to accept the framework of beliefs hold by SI then I would assign a low probability to the possibility that the default scenario in which an AI undergoes recursive self-improvement will include a lot of blackmailing that leads to a lot of suffering. Where the default is that nobody tries to make AI friendly.

I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:

I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.

ETA Not sure why the graph looks so messed up. Does anyone know of a better graphing tool?

I think that working towards friendly AI will in most cases lead to negative utility scenarios that vastly outweigh the negative utility of an attempt that creating a simple transformer that turns the universe into an inanimate state.

I think it's too early to decide this. There are many questions whose answers will become clearer before we have to make a choice one way or another. If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best.

Now you can either try to build an organization like that from scratch, or try to push SIAI in that direction (i.e., make it more strategic and less attached to a specific plan). Of course, being lazy, I'm more tempted to do the latter, but your miles may vary. :)

If eventually it becomes clear that building an antinatalist AI is the right thing to do, I think the best way to accomplish it would be through an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best.

Yes.

I, for one, am ultimately concerned with doing whatever's best. I'm not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.

I, for one, am ultimately concerned with doing whatever's best. I'm not wedded to doing FAI, and am certainly not wedded to doing 9-researchers-in-a-basement FAI.

Well, that's great. Still, there are quite a few problems.

How do I know

  • ... that SI does not increase existential risk by solving problems that can be used to build AGI earlier?
  • ... that you won't launch a half-baked friendly AI that will turn the world into a hell?
  • ... that you don't implement some strategies that will do really bad things to some people, e.g. myself?

Every time I see a video of one of you people I think, "Wow, those seem like really nice people. I am probably wrong. They are going to do the right thing."

But seriously, is that enough? Can I trust a few people with the power to shape the whole universe? Can I trust them enough to actually give them money? Can I trust them enough with my life until the end of the universe?

You can't even tell me what "best" or "right" or "winning" stands for. How do I know that it can be or will be defined in a way that those labels will apply to me as well?

I have no idea what your plans are for the day when time runs out. I just hope that you are not going to hope for the best and run some not quite friendly AI that does really crappy things. I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.

Can I trust a few people with the power to shape the whole universe?

Hell no.

This is an open problem. See "How can we be sure a Friendly AI development team will be altruistic?" on my list of open problems.

I hope you consider the possibility of rather blowing everything up than risking even worse outcomes.

Blowing everying up would be pretty bad. Bad enough to not encourage the possibility.

"Would you murder a child, if it's the right thing to do?"

an organization that's like SIAI but isn't too attached to the idea of FAI and just wants to do whatever is best.

If FAI is by definition a machine that does whatever is best, this distinction doesn't seem meaningful.

Ok, let me rephrase that to be clearer.

an organization that's like SIAI but isn't too attached to a specific kind of FAI design (that may be too complex and prone to fail in particularly horrible ways), and just wants to do whatever is best.

Do you think SingInst is too attached to a specific kind of FAI design? This isn't my impression. (Also, at this point, it might be useful to unpack "SingInst" into particular people constituting it.)

Do you think SingInst is too attached to a specific kind of FAI design?

XiXiDu seems to think so. I guess I'm less certain but I didn't want to question that particular premise in my response to him.

It does confuse me that Eliezer set his focus so early on CEV. I think "it's too early to decide this" applies to CEV just as well as XiXiDu's anti-natalist AI. Why not explore and keep all the plausible options open until the many strategically important questions become clearer? Why did it fall to someone outside SIAI (me, in particular) to write about the normative and meta-philosophical approaches to FAI? (Note that the former covers XiXiDu's idea as a special case.) Also concerning is that many criticisms have been directed at CEV but Eliezer seems to ignore most of them.

Also, at this point, it might be useful to unpack "SingInst" into particular people constituting it.

I'd be surprised if there weren't people within SingInst who disagree with the focus on CEV, but if so, they seem reluctant to disagree in public so it's hard to tell who exactly, or how much say they have in what SingInst actually does.

I guess this could all be due to PR considerations. Maybe Eliezer just wanted to focus public attention on CEV because it's the politically least objectionable FAI approach, and isn't really terribly attached to the idea when it comes to actually building an FAI. But you can see how an outsider might get that impression...

I always thought CEV was half-baked as a technical solution, but as a PR tactic it is...genius.

Yeah, I thought it was explicitly intended more as a political manifesto than a philosophical treatise. I have no idea why so many smart people, like lukeprog, seem to be interpreting it not only as a philosophical basis but as outlining a technical solution.

Why do you think an unknown maximizer would be worse than a not quite friendly AI? Failed Utopia #4-2 sounds much better than a bunch of paperclips. Orgasmium sounds at least as good as paper clips.

Graphs make your case more convincing - even when they are drawn wrong and don't make sense!

...but seriously: where are you getting the figures in the first graph from?

Are you one of these "negative utilittarians" - who thinks that any form of suffering is terrible?

I believe that any failed attempt at friendly AI is much more likely to 1) engage in blackmailing 2) keep humans alive 3) fail in horrible ways:

You sound a bit fixated on doom :-(

What do you make of the idea that the world has been consistently getting better for most of the last 3 billion years (give or take the occasional asteroid strike) - and that the progress is likely to continue?

I have previously mentioned my antipathy regarding the FAI concept. I think FAI is very a dangerous concept, it should be dropped. See this article of mine for more info on my views http://hplusmagazine.com/2012/01/16/my-hostility-towards-the-concept-of-friendly-ai/

The paperclip maximizer (PM) encounters an alien civilization and causes lots of suffering warring with it

I don't think that it is likely that it will encounter anything that has equal resources and if it does that suffering would occur (see below).

PM decides there's a chance that it's in a simulation run by a sadistic being who will punish it (prevent it from making paperclips) unless it creates trillions of conscious beings and tortures them

That seems like one of the problems that have to be solved in order to build an AI that transforms the universe into an inanimate state. But I think it is much easier to make an AI not simulate any other agents than to create a friendly AI. Much more can go wrong by creating a friendly AI, including the possibility that it tortures trillions of beings. In the case of a transformer you just have to make sure that it values an universe that is as close as possible to a state where no computation takes place and that does not engage in any kind of trade, acausal or otherwise.

PM is itself capable of suffering

I believe that any sort of morally significant suffering is an effect of (natural) evolution, and may in fact be dependent on that. I think that the kind of maximizer that SI has in mind is more akin to a transformation process that isn't consciousness, does not have emotions and cannot suffer. If those qualities would be necessary requirements then I don't think that we will build an artificial general intelligence any time soon and that if we do it will happen slowly and not be able to undergo dangerous recursive self-improvement.

somebody steals PM's source code before it's launched, and makes a sadistic AI

I think that this is more likely to be the case with friendly AI research because it takes longer.

Currently you suspect that there are people, such as yourself, who have some chance of correctly judging whether arguments such as yours are correct, and of attempting to implement the implications if those arguments are correct, and of not implementing the implications if those arguments are not correct.

Do you think it would be possible to design an intelligence which could do this more reliably?

I don't get it. Design a Friendly AI that can better judge whether it's worth the risk of botching the design of a Friendly AI?

ETA: I suppose your point applies to some of XiXiDu's concerns but not others?

I don't understand. Is the claim here that you can build a "decide whether the risk of botched Friendly AI is worth taking machine", and the risk of botching such a machine is much less than the risk of botching a Friendly AI?

A FAI that includes such "Should I run?" heuristic could pose a lesser risk than a FAI without such heuristic. If this heuristic works better than human judgment about running a FAI, it should be used instead of human judgment.

This is the same principle as for AI's decisions themselves, where we don't ask AI's designers for object-level moral judgments, or encode specific object-level moral judgments into AI. Not running an AI would then be equivalent to hardcoding the decision "Should the AI run?" resolved by designers to "No." into the AI, instead of coding the question and letting the AI itself answer it (assuming we can expect it to answer the question more reliably than the programmers can).

If we botched the FAI, wouldn't we also probably have botched its ability to decide whether it should run?

Yes, and if it tosses a coin, it has 50% chance of being right. The question is calibration, how much trust should such measures buy compared to their absence, given what is known about given design.

The machine Steve proposes might not bear as much risk of creating "living hell" by attempting to get the human utility function right, but missing in such a way that humans are still alive, just living very unpleasantly. To me, this seems by far the biggest of XiXiDu's concerns.

Here are a few premises:

  • Complex systems can fail in complex ways.
  • Destruction is easier than creation.
  • Expected utility maximization is rational and feasible.
  • We should be extremely conservative about not implementing a half-baked friendly AI.

If you believe that self-improving AI is inevitable and that creating friendly AI is more difficult than creating unfriendly AI then to launch an AI that simply destroys everything as quickly as possible has a higher expected utility than doing nothing or trying to implement an AI that is not completely friendly.

The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances. To prevent those scenarios one can try to solve friendly AI, which will most likely fail (or even increase the chances of a negative singularity), or try to launch a destructive singleton with simple goals to prevent further suffering and the evolution of life elsewhere in the universe. Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.

Assuming your argument is correct, wouldn't it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.

I'm skeptical that friendly AI is as difficult as all that because, to take an example, humans are generally considered pretty "wicked" by traditional writers and armchair philosophers, but lately we haven't been murdering each other or deliberately going out of way to make each other's lives miserable very often. For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I'm just human. Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that? There were a lot of things in history and science that seemed mystically complex but turned out to be formalizable in compressed ways, such as the mathematics of Darwinian population genetics. Who would have imagined that the "Secrets of Life and Creation" would be revealed like that? But they were. Could "sufficient goodness that we can be convinced the agent won't put us through hell" also have a compact description that was clearly tractable in retrospect?

Assuming your argument is correct, wouldn't it make more sense to blow ourselves up with nukes rather than pollute the universe with UFAI? There may be other intelligent civilizations out there leading worthwhile lives that we threaten unfairly by unleashing UFAI.

There might be countless planets that are about to undergo an evolutionary arms race for the next few billions years resulting in a lot of suffering. It is very unlikely that there is a single source of life that is exactly on the right stage of evolution with exactly the right mind design to not only lead worthwhile lives but also get their AI technology exactly right to not turn everything into a living hell.

In case you assign negative utility to suffering, which is likely to be universally accepted to have negative utility, then given that you are an expected utility maximizer it should be a serious consideration to end all life. Because 1) agents that are an effect of evolution have complex values 2) to satisfy complex values you need to meet complex circumstances 3) complex systems can fail in complex ways 4) any attempt at friendly AI, which is incredible complex, is likely to fail in unforeseeable ways.

For instance, say I were invincible. I could theoretically stab everyone I meet without any consequences, but I doubt I would do that. And I'm just human.

To name just one example where things could go horrible wrong. Humans are by their very nature interested in domination and sex. Our aversion against sexual exploitation is largely dependent on the memeplex of our cultural and societal circumstances. If you knew more, were smarter and could think faster you might very well realize that such an aversion is a unnecessary remnant that you can easily extinguish to open up new pathways to gain utility. That Gandhi would not agree to have his brain modified into a baby-eater is incredible naive. Given the technology people will alter their preferences and personality. Many people actually perceive their moral reservations to be limiting. It only takes some amount of insight to just overcome such limitations.

You simply can't be sure that future won't hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.

Goodness may seem mystical and amazingly complex from our current viewpoint, but is it really as complex as all that?

Maybe not, but betting on the possibility that goodness can be easily achieved is like pulling a random AI from mind design space hoping that it turns out to be friendly.

You simply can't be sure that future won't hold vast amounts of negative utility. It is much easier for things to go horrible wrong than to be barely acceptable.

Similarly, it is easier to make piles of rubble than skyscrapers. Yet - amazingly - there are plenty of skyscrapers out there. Obviously something funny is going on...

The negative utility of an AI that transforms the universe into an inanimate state is simply the positive utility of a perfectly friendly AI. But most outcomes are expected to yield unfriendly AI, or not quite friendly AI, which will actively increase negative utility by possibly keeping alive living beings indefinitely given abhorrent circumstances.

Hang on, though. That's still normally better than not existing at all! Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as "below zero". Most plausible futures just aren't likely to be that bad for the creatures in them.

Hell has to be at least bad enough for the folk in it to want to commit suicide for utility to count as "below zero". Most plausible futures just aren't likely to be that bad for the creatures in them.

The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy. That's the case for most people. We just tend to remember the good moments more than the rest of our life.

It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you're going to lose the fight against the general decay. Any temporary success is just a statistical fluke.

The present is already bad enough. There is more evil than good. You are more often worried than optimistic. You are more often hurt than happy.

No, I'm not!

That's the case for most people. We just tend to remember the good moments more than the rest of our life.

Yet most creatures would rather live than die - and they show that by choosing to live. Dying is an option - they choose not to take it.

It is generally easier to arrive at bad world states than good world states. Because to satisfy complex values you need to meet complex circumstances. And even given simple values and goals, the laws of physics are grim and remorseless. In the end you're going to lose the fight against the general decay. Any temporary success is just a statistical fluke.

It sounds as though by now there should be nothing left but dust and decay! Evidently something is wrong with this reasoning. Evolution produces marvellous wonders - as well as entropy. Your existence is an enormous statistical fluke - but you still exist. There's no need to be "down" about it.

You are more often hurt than happy.

For some people, this is a solved problem.

Creating a destructive self-improving AI that kills all life should be the easiest of all possibilities with a high probability of success.

Where "success" refers to obliterating yourself and all your descendants. That's not how most Darwinian creatures usually define success. Natural selection does build creatures that want to die - but only rarely and by mistake.

As pessimistic as this sounds, I'm not sure if I actually disagree with any of it.

Earlier, you wrote

Personally I don't want to contribute anything to an organisation which admits to explore strategies that are unacceptable by most people. And I wouldn't suggest anyone else to do so.

Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I'm confused. Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don't?

Surely building an anti-natalist AI that turns the universe into inert matter would be considered unacceptable by most people. So I'm confused.

Yet I am not secretive about it and I believe that it is one of the less horrible strategies. Given that SI is strongly attached to decision theoretic ideas, which I believe are not the default outcome due to practically intractable problems, I fear that their strategies might turn out to be much worse than the default case.

I think that it is naive to simply trust SI because they seem like nice people. Although I don't doubt that they are nice people. But I think that any niceness is easily drowned by their eagerness to take rationality to its logical extreme without noticing that they have reached a point where the consequences constitute a reductio ad absurdum. If game and decision theoretic conjectures show that you can maximize expected utility by torturing lots of people, or by voluntary walking into death camps, then that's the right thing to do. I don't think that they are psychopathic personalities per se though. Those people are simply hold captive by their idea of rationality. And that is what makes them extremely dangerous.

Do you intend to denounce SIAI if they do seriously consider this strategy, and also if they don't?

I would denounce myself if I would seriously consider that strategy. But I would also admire them for doing so because I believe that it is the right thing to do given their own framework of beliefs. What they are doing right now seems just hypocritical. Researching FAI will almost certainly lead to worse outcomes than researching how to create an anti-natalist AI as soon possible.

What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.

I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments. The only danger I see is that some people associated with SI/lesswrong might run rampant once someone demonstrates certain AI capabilities.

All in all I think they are just fooling themselves. They collected massive amounts of speculative math and logic and combined it into a framework of beliefs that can be used to squash any common sense. They have seduced themselves with formulas and lost any ability to discern scribbles on paper from real world rationality. They managed to give a whole new meaning to the idea of model uncertainty by making it reach new dramatic heights.

Bayes’ Theorem, the expected utility formula, and Solomonoff induction are unusable in most but a few limited situations where you have a well-defined testable and falsifiable hypothesis or empirical data. In most situations those heuristics are computationally intractable, one more than the other.

There is simply no way to assign utility to world states without deluding yourself to believe that your decisions are more rational than just trusting your intuition. There is no definition of "utility" that's precise enough to figure out what a being that maximizes it would do. There can't be, not without unlimited resources. Any finite list of actions maximizes infinitely many different quantities. Utility does only become well-defined if we add limitations on what sort of quantities we consider. And even then...

Preferences are a nice concept. But they are just as elusive as the idea of a "self". Preferences are not just malleable but they keep changing as we make more observations, and so does the definition of utility. Which makes it impossible to act in a time-consistent way.

What I really believe is that there is not enough data to come to any definitive conclusion about the whole idea of a technological singularity and dangerous recursive self-improvement in particular and that it would be stupid to act on any conclusion that one could possible come up with at this point.

I agree with the "not enough data to come to any definitive conclusion" part, but think we could prepare for the Singularity by building an organization that is not attached to any particular plan but is ready to act when there is enough data to come to definitive conclusions (and tries to gather more data in the mean time). Do you agree with this, or do you think we should literally do nothing?

I believe that SI/lesswrong mainly produces science fiction and interesting, although practically completely useless, though-experiments.

I guess I have a higher opinion of SIAI than that. Just a few months ago you were saying:

I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.

What made you change your mind since then?

I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.

What made you change your mind since then?

I did not change my mind. All I am saying is that I wouldn't suggest anyone to contribute money to SI who fully believes what they believe. Because that would be counterproductive. If I accepted all of their ideas then I would make the same suggestion as you, to build "an organization that is not attached to any particular plan".

But I do not share all of their beliefs. Particularly I do not currently believe that there is a strong case that uncontrollable recursive self-improvement is possible. And if it is possible I do not think that it is feasible. And even if it is feasible I believe that it won't happen any time soon. And if it will happen soon I do not think that SI will have anything to do with it.

I believe that SI is an important organisation that deserves money. Although if I would share their idea of rationality and their technological optimism then the risks would outweigh the benefit.

Why I believe SI deserves money:

  • It makes people think by confronting them with the logical consequences of state of the art ideas from the field of rationality.
  • It explores topics and fringe theories that are neglected or worthy of consideration.
  • It challenges the conventional foundations of charitable giving, causing organisations like GiveWell to reassess and possibly improve their position.
  • It creates a lot of exciting and fun content and dicussions.

All in all I believe that SI will have a valuable influence. I believe that the world needs people and organisations that explore crazy ideas, that try to treat rare diseases in cute kittens and challenge conventional wisdom. And SI is such an organisation. Just like Roger Penrose and Stuart Hameroff. Just like all the creationists who caused evolutionary biologist to hone their arguments. SI will influence lots of fields and make people contemplate their beliefs.

To fully understand why my criticism of SI and willingness to donate does not contradict, you also have to realize that I do not accept the usual idea of charitable giving that is being voiced here. I think that the reasons for why people like me contribute money to charities and causes are complex and can't be reduced to something as simple as wanting to do the most good. It is not just about wanting to do good, signaling or warm fuzzies. It is is all of it and much more. I also believe that it is piratically impossible to figure out how to maximize good deeds. And even if you were to do it for selfish reasons, you'd have to figure out what you want in the first place. An idea which is probably "not even wrong".

I also fear that, at some point, I might need the money. Otherwise I would have already donated a lot more to the Singularity Institute years ago.

What made you change your mind since then?

Before you throw more of what I wrote in the past at me:

  • I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
  • I don't have a firm opinion on many issues.
  • There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
  • Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
  • I find it very easy to argue for whatever position.
  • I don't really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
  • I am sometimes not completely honest to exploit the karma system. Although I don't do that deliberately.
  • If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.

The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven't thought about that either :-)

I just don't have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.

Before you throw more of what I wrote in the past at me:

I have quoted everything XiXiDu said here so that it is not lost in any future edits.

Many of XiXis contributions consist of persuasive denunciations. As he points out in the parent (and quoted below), often these are based off little research, without much contemplation and are done to provoke reactions rather than because they are correct. Since XiXiDu is rather experienced at this mode of communication - and the arguments he uses have been able to be selected for persuasiveness through trial and error - there is a risk that he will be taken more seriously than is warranted.

The parent should be used to keep things in perspective when XiXiDu is rabble rousing.

  • I sometimes take different positions just to explore an argument, because it is fun to discuss and because I am curious what reactions I might provoke.
  • I don't have a firm opinion on many issues.
  • There are a lot of issues for which there are as many arguments that oppose a certain position as there are arguments that support it.
  • Most of what I write is not thought-out. I most often do not consciously contemplate what I write.
  • I find it very easy to argue for whatever position.
  • I don't really care too much about most issues but write as if I do, to evoke feedback. I just do it for fun.
  • I am sometimes not completely honest to exploit the karma system. Although I don't do that deliberately.
  • If I believe that SI/lesswrong could benefit from criticism I voice it if nobody else does.

The above is just some quick and dirty introspection that might hint at the reason for some seemingly contradictionary statements. The real reasons are much more complex of course, but I haven't thought about that either :-)

I just don't have the time right now to think hard about all the issues discussed here. I am still busy improving my education. At some point I will try to tackle the issues with due respect and in all seriousness.

That said, I think his fear of culpability (for being potentially passively involved in an existential catastrophe) is very real. I suspect he is continually driven, at a level beneath what anyone's remonstrations could easily affect, to try anything that might somehow succeed in removing all the culpability from him. This would be a double negative form of "something to protect": "something to not be culpable for failure to protect".

If this is true, then if you try to make him feel culpability for his communication acts as usual, this will only make his fear stronger and make him more desperate to find a way out, and make him even more willing to break normal conversational rules.

I don't think he has full introspective access to his decision calculus for how he should let his drive affect his communication practices or the resulting level of discourse. So his above explanations for why he argues the way he does are probably partly confabulated, to match an underlying constraining intuition of "whatever I did, it was less indefensible than the alternative".

(I feel like there has to be some kind of third alternative I'm missing here, that would derail the ongoing damage from this sort of desperate effort by him to compel someone or something to magically generate a way out for him. I think the underlying phenomenon is worth developing some insight into. Alex wouldn't be the only person with some amount of this kind of psychology going on -- just the most visible.)