My housemate Kelsey "theunitofcaring" has begun hosting an AI reading group in our house. Our first meeting was yesterday evening, and over a first draft attempt at chocolate macarons, we discussed this article about AI safety and efficiency by Paul Christiano, and various ideas prompted thereby at greater or lesser remove.

One idea that came up is what we decided to call "tipping point AI" (because apparently there are a lot of competing definitions for "transformative" AI). The definition we were using for tipping point AI was "something such that it, or its controller, is capable of preventing others from building AIs". The exact type and level of capability here could vary - for instance, if it's built after we've colonized Mars (that is, colonized it to an extent such that Martians could undertake projects like building AIs), then a tipping point AI has to be able to project power to Mars in some form, even if the only required level of finesse is lethality. But if it's before we've colonized Mars, it can be unable to do that, and just able to prevent colonization projects in addition to AI projects.

One hypothesis that has been floated in a context such that we are pretty sure it is not anyone's real plan is that an AI could just destroy all the GPUs on the planet and prevent the manufacture of new ones. This would be bad for Bitcoins, video games, and AI projects, but otherwise relatively low-impact. An AI might be able to accomplish this task by coercion, or even by proxy - the complete system of "the AI, and its controller" needs to be able to prevent AI creation by other agents, so the AI itself might only need to identify targets for a controller who already wields enough power to fire missiles or confiscate hardware and chooses to do so in service of this goal, perhaps the US government.

The idea behind creating tipping point AI isn't that this is where we stop forever. The tipping point AI only has to prevent other agents from building their own in their basements. It eliminates competition. Some features of a situation in which a tipping point AI exists include:

  • The agent controlling the AI can work on more sophisticated second drafts without worrying about someone else rushing to production unsafely.
  • The controlling agent can publish insights and seek feedback without worrying about plagiarism, code forks, etc.
  • They can apply the AI's other abilities, if any (there will presumably be some, since "prevent AI creation" is not a primitive action - some surveillance capability seems like a minimum to me) to their other problems, perhaps including creating a better AI. Even if this application has economic or other benefits that might attract others to similar solutions by default, the AI will prevent that, so no one will be (productively) startled or inspired into working on AI faster by seeing the results.

However, if you're an agent controlling a tipping point AI, you have a problem: the bus number* of the human race has suddenly dropped to "you and your cohort". If anything happens to you - and an AI being tipping point variety doesn't imply it can help you with all of the things that might happen to you - then the AI is leaderless. This, depending on its construction, might mean it goes rogue and does something weird, that it goes dormant and there's no protection against a poorly built new AI project, or that it keeps doing whatever its last directive was (in the example under discussion, "prevent anyone from building another AI"). None of these are good states to have obtain permanently.

So you might want to define, and then architect into your AI the definition of, organizational continuity, robustly enough that none of those things will happen.

This isn't trivial - it's almost certainly easier than defining human value in general, but that doesn't mean it's simple. Your definition has to handle internal schisms, both overt and subtle, ranging from "the IT guy we fired is working for would-be rivals" to "there's serious disagreement among our researchers about whether to go ahead with Project Turaco, and Frances and Harold are working on a Turaco fork in their garage". If you don't want the wrong bus accident (or assassination) to mean that humanity ends, encounters a hard stop in its technological progress, or has its panopticonic meddling intelligence inherited by a random person who chose the same name for their uber-for-spirulina business? Then you need to have a way to pass on the mandate of heaven.

One idea that popped into my head while I was turning over this problem was a code of organizational conduct. This allows the organization to resume after a discontinuity, without granting random people a first-mover advantage at picking up the dropped mantle unless they take it up whole. It's still a simpler problem than human value in general, but it's intermediate between that and "define members of a conventional continuous group of humans". The code has to be something that includes its own decisionmaking process - if six people across the globe adopt a code simultaneously they'll need to resolve conflicts between them just as much as the original organization did. You presumably want to incorporate security features that protect both against garage forks of Projects Turaco and also against ill-intentioned or not-too-bright inheritors of your code.

Other options include:

  • Conventional organizational continuity. You have, perhaps, a board of directors who never share a vehicle, and they have some sort of input into the executives of the organization, and you hope nobody brings the plague to work, and there is some sort of process according to which decisions are made and some sort of process for defaulting if decisions fail to be made.
  • Designated organizational heirs: if your conventional organization fails, then your sister project, who are laying theoretical groundwork but not building anything yet because you have a tipping point AI and you said so, get the mandate of heaven and can proceed. This assumes that you think their chances of achieving value alignment are worse than yours but better than any other option. This has obvious incentive problems with respect to the other organization's interest in yours suddenly ceasing to exist.
  • Non-organization based strategies (a line of succession of individuals). People being changeable, this list would need to be carefully curated and carefully maintained by whoever was ascendant, and it would be at substantial risk of unobserved deception, errors in judgment, or evolution over time of heirs' interests and capabilities after their predecessors can no longer edit the line of succession. These would all be capable of affecting the long term future of humanity once the AI changed hands.
  • I'm sure there are things I haven't thought of.

I don't have a conclusion because I just wrote this about thoughts that I had in response to the meeting, to let other people who can't attend still be in on some of what we're talking and thinking about.

*The number of people who can be hit by a bus before the organization ceases to function

New to LessWrong?

New Comment
18 comments, sorted by Click to highlight new comments since: Today at 12:16 AM

Good luck with the in-person AI Safety reading group. It sounds productive and fun.

For the past 2 years, I have been running the Skype-based AISafety.com Reading Group. You can see the material we have covered at https://aisafety.com/reading-group/ . Yesterday, Vadim Kosoy from MIRI gave a great presentation of his Learning-Theoretic Agenda: https://youtu.be/6MkmeADXcZg

Usually, I try to post a summary of the discussion to our Facebook group, but I've been unable to get a follow-on discussion going. Your summary/idea above is higher quality than what I post.

Please tell me if you have any ideas for collaboration between our reading groups, or if I can do anything else to help you :).

My opinion: The capacity to forcibly halt all rival AI projects, is only to be expected in an AI project that has already produced a singularity. It is not a viable tactic if you are aiming to create a friendly singularity. In that case, there is no alternative to solving the problems of friendly values and value stability, and either reaching singularity first, or influencing those who will get there before you.

The capacity to forcibly halt all rival AI projects, is only to be expected in an AI project that has already produced a singularity.

What do you mean by singularity, and why would this be true? It seems somewhat likely that forcibly halting rival AI projects could be done in the near term with a combination of surveillance technology (including machine learning to flag interesting data), drones, and conventional military/policing tech. (Of course, permitting solutions involving more cooperation and less force makes the problem easier). Doing this with a smaller group requires more advanced technology than doing it with a larger group, but why would it ever require a singularity?

Can someone in China halt AI research at Google and Amazon? Can someone in America halt AI research at Tencent and Baidu? Could the NSA halt unapproved AI research just throughout America?

By a singularity I mean creation of superhuman intelligence that nothing in the world can resist.

Clarifying what "AI research" means is hard (is cognitive science AI research?), but in the examples you gave, halting large-scale AI experiments seems doable with a military advantage. The NSA case seems especially easy (if we upgrade the NSA's technology and social organization moderately); that's just enforcing regulations under a surveillance state. Obviously, non-military solutions such as hiring away researchers are also possible.

In a tautological sense, any group of humans and technological systems that attains global strategic dominance is "a superhuman intelligence that nothing in the world can resist", so I would agree with your point there. But the usual connotations of "singularity" (e.g. near-hyperbolic rate of technological progress) need not apply.

Given that the system is composed of both humans and technological systems, it is not necessary that "friendly values and value stability" are solved, it just has to be the case that the people in charge have decent values and are capable of thinking and steering with technological assistance.

How are you going to stop a rival nuclear-armed state from doing whatever it wants on its own territory?

Obviously, cooperative solutions are to be pursued first; to the extent that it isn't in anyone's interest to destroy the world, there is already significant alignment of interests. But there is the question of what to do about regimes that are uncooperative to the point of omnicidality.

In general, it would be somewhat strange if a superintelligent AI were able to stop dangerous research but a somewhat-superintelligent group of humans and technology systems with a large technical advantage (including a large cognitive advantage) were unable to do the same. For any solution a superintelligent AI could use, you could ask what is the nearest solution that could be pursued with foreseeable technology.

Here are some ideas (which are by no means complete):

  • Propaganda directed at AI researchers, to convince them that destroying the world is actually bad, and also low-status or something
  • Offering AI researchers better jobs, which the targeted state must accept due to trade agreements
  • Working with insiders to sabotage specific projects
  • Computer hacking
  • Economic sanctions
  • General economic sabotage
  • Military coup or revolution
  • Targeted assassinations and destruction of capital
  • Winning the MAD signalling game (i.e. changing the Schelling point such that the enemy must do what you want them to do, and it is not in their interest to bomb you)
  • Making it clear that absorbing the losses due to nuclear attacks is an acceptable cost, and then pursuing conventional military measures, such that they have no reason not to surrender (the desirability of this method depends on how many nukes they have)

Some of these options can be covert, making retaliation difficult.

A great power can think about doing such things against an opponent. But I thought we were talking about a scenario in which some AI clique has halted *all* rival AI projects throughout the entire world, effectively functioning like a totalitarian world government, but without having actually crossed the threshold of superintelligence. That is what I am calling a fantasy.

The world has more than one great power, great powers are sovereign within their own territory, and you are not going to overcome that independence by force, short of a singularity. The rest of the world will never be made to stop, just so that one AI team can concentrate on solving the problems of alignment without having to look over its shoulder at the competition.

OK, I might have initially misinterpreted your wording as saying that the only group capable of forcibly halting rival AI projects is an AI project capable of producing aligned AGI, whereas you were only claiming that the only AI project capable of forcibly halting rival AI projects in an AI project capable of producing aligned AGI.

Still, it is definitely possible to imagine arrangements such as an AI project working closely with one or more governments, or a general project that develops narrow AI in addition to other technology such as cognitive enhancement, that would have a shot at pulling this off. One of the main relevant questions is whether doing this is easier or harder than solving AGI alignment. In any case we probably have some disagreement about the difficulty of a group of less than 100,000 people (i.e. not larger than current big tech companies) developing a large technological advantage over the rest of the world without developing AGI.

"Organization working on AI" vs "any other kind of organization" is not the important point. The important point is ALL. We are talking about a hypothetical organization capable of shutting down ALL artificial intelligence projects that it does not like, no matter where on earth they are. Alicorn kindly gives us an example of what she's talking about: "destroy all the GPUs on the planet and prevent the manufacture of new ones".

Just consider China, Russia, and America. China and America lead everyone else in machine learning; Russia has plenty of human capital and has carefully preserved its ability to not be pushed around by America. What do you envisage - the three of them agree to establish a single research entity, that shall be the only one in the world working on AI near a singularity threshold, and they agree not to have any domestic projects independent of this joint research group, and they agree to work to suppress rival groups throughout the world?

Despite your remarks about how the NSA could easily become the hub of a surveillance state tailored to this purpose, I greatly doubt the ability of NSA++ to successfully suppress all rival AI work even within America and throughout the American sphere of influence. They could try, they could have limited success - or they could run up against the limits of their power. Tech companies, rival agencies, coalitions of university researchers, other governments, they can all join forces to interfere.

In my opinion, the most constructive approach to the fact that there are necessarily multiple contenders in the race towards superhuman intelligence, is to seek intellectual consensus on important points. The technicians who maintain the world's nuclear arsenals agree on the basics of nuclear physics. The programmers who maintain the world's search engines agree on numerous aspects of the theory of algorithms. My objective here would be that the people who are working in proximity to the creation of superhuman intelligence, develop some shared technical understandings about the potential consequences of what they are doing, and about the initial conditions likely to produce a desirable rather than an undesirable outcome.

Let's zoom in to the NSA++ case, since that continues to be a point of disagreement. Do you think that, if the US government banned GPUs within US borders that have above a certain level of performance (outside a few high-security government projects), and most relevant people agreed that this ban was a good idea, that it would not be possible for NSA++ to enforce this ban? The number of GPU manufacturers in the US is pretty low.

Banning high-end GPUs so that only the government can have AI? They could do it, they might feel compelled to do something like it, but there would be serious resistance and moments of sheer pandemonium. They can say it's to protect humanity, but to millions of people it will look like the final step in the enslavement of humanity.

Leaving aside the question of shutting down all rival AI projects, if indeed NSA++ can regulate high-end GPUs in the US, international regulation of GPUs such that only a handful of projects can run large experiments seems doable through international agreements, soft power, and covert warfare. This seems similar to international regulation of nuclear weapons or CFCs. (I am not claiming this is not hard, just that it is possible and not a fantasy)

At this point, similar to what you suggested, the better solution would be to have the people working at all of these projects know about what things are likely to be dangerous, and avoid those things (of course, this means there have to be few enough projects that it's unlikely for a single bad actor to destroy the world). The question of shutting down all other projects is moot at this point, given that it's unnecessary and it's not clear where the will to do this would come from. And if the projects coordinate successfully, it's similar to there being only one project. (I do think it is possible to shut down the other projects by force with a sufficient technical advantage, but it would have a substantial chance of triggering World War 3; realistically, this is also the case for applications of task-based tipping point AI, for exactly the same reasons)

I've started saying that we are already firmly in the era of transformative AI and I don't thing transformative is the right word for singularity or singleton sorts of speculative AI. That rate of transformation has plenty of space to accelerate, but I like this framing because of the trend of defining AI as whatever doesn't exist yet. Plenty of real AI has remade the world and our lives already; it seems like an improper redefinition of the word transformative to say we aren't there yet.

Is what you describing as "tipping point AI" also something that could be classified as singleton AI? The ability for an AI to destroy other AIs in progress and computer hardware is most of the way towards creating a political singleton.

The problem with this approach to limiting an arms race is that no one knows what the system requirements of an AGI are. Depending on the processing power needed, stopping anyone else creating an AGI could be just a question of limiting research into better computers, or you might need to destroy a few supercomputers and stop anyone building a huge botnet. On the other extreme, if AGI takes very little processing, you might need to destroy almost everything containing multiple transistors, impose strict controls on relays and vacuum tubes, and stop anyone spending all weekend doing unknown mental arithmetic. This is beyond even what a totalitarian world government could probably achieve.

Of course, we don't know how much processing power is needed for how much intelligence, And I suspect that proving a general theorem that all AGI's (for some definition of AGI that's actually meaningful) need at least X processing power (for some nontrivial X) Would be harder than building an AGI. Proving stuff about computational complexity is hard in general, see P=NP?

If you have an AI, you have an upper bound on the system requirements.

Unknown mental arithmetic

This sounds like your AI doesn't distinguish between intelligence and artificial intelligence.

My point is that it's hard to get a lower bound.

I would also argue that it could be possible to take an algorithm for a paperclip maximizer, and give it to a human to manually calculate. With the algorithm not having the same goals a the human, the human not knowing the algorithms goals, and the algorithm being able to solve problems that the lone human can't. More so with many humans.

While I am familiar with the Chinese room argument, I don't see that scenario as realistic.

the algorithm being able to solve problems that the lone human can't.

I don't see how an AI could run efficiently enough to be a threat, and not be understood by the human running it.* While it's about writing, this mentions Vinge's Law: "if you know exactly what a very smart agent would do, you must be at least that smart yourself." This doesn't have to hold for an algorithm but it seems hard enough to circumvent, that I don't see how an AI could go FOOM in someone's brain/on paper/in sets of dominos.

*I could see this problem potentially existing with an algorithm for programming, or an elaborate mnemonic device, made by an AI or very smart person, which contains say, a compressed source code for an AI, which the human who (attempts) to memorize it, can remember, but not understand the workings of - even if I memorized "uryybjbeyq" I might not recognize what that's rot13 for, or that it has any meaning at all.