Aug 10, 2018
My housemate Kelsey "theunitofcaring" has begun hosting an AI reading group in our house. Our first meeting was yesterday evening, and over a first draft attempt at chocolate macarons, we discussed this article about AI safety and efficiency by Paul Christiano, and various ideas prompted thereby at greater or lesser remove.
One idea that came up is what we decided to call "tipping point AI" (because apparently there are a lot of competing definitions for "transformative" AI). The definition we were using for tipping point AI was "something such that it, or its controller, is capable of preventing others from building AIs". The exact type and level of capability here could vary - for instance, if it's built after we've colonized Mars (that is, colonized it to an extent such that Martians could undertake projects like building AIs), then a tipping point AI has to be able to project power to Mars in some form, even if the only required level of finesse is lethality. But if it's before we've colonized Mars, it can be unable to do that, and just able to prevent colonization projects in addition to AI projects.
One hypothesis that has been floated in a context such that we are pretty sure it is not anyone's real plan is that an AI could just destroy all the GPUs on the planet and prevent the manufacture of new ones. This would be bad for Bitcoins, video games, and AI projects, but otherwise relatively low-impact. An AI might be able to accomplish this task by coercion, or even by proxy - the complete system of "the AI, and its controller" needs to be able to prevent AI creation by other agents, so the AI itself might only need to identify targets for a controller who already wields enough power to fire missiles or confiscate hardware and chooses to do so in service of this goal, perhaps the US government.
The idea behind creating tipping point AI isn't that this is where we stop forever. The tipping point AI only has to prevent other agents from building their own in their basements. It eliminates competition. Some features of a situation in which a tipping point AI exists include:
However, if you're an agent controlling a tipping point AI, you have a problem: the bus number* of the human race has suddenly dropped to "you and your cohort". If anything happens to you - and an AI being tipping point variety doesn't imply it can help you with all of the things that might happen to you - then the AI is leaderless. This, depending on its construction, might mean it goes rogue and does something weird, that it goes dormant and there's no protection against a poorly built new AI project, or that it keeps doing whatever its last directive was (in the example under discussion, "prevent anyone from building another AI"). None of these are good states to have obtain permanently.
So you might want to define, and then architect into your AI the definition of, organizational continuity, robustly enough that none of those things will happen.
This isn't trivial - it's almost certainly easier than defining human value in general, but that doesn't mean it's simple. Your definition has to handle internal schisms, both overt and subtle, ranging from "the IT guy we fired is working for would-be rivals" to "there's serious disagreement among our researchers about whether to go ahead with Project Turaco, and Frances and Harold are working on a Turaco fork in their garage". If you don't want the wrong bus accident (or assassination) to mean that humanity ends, encounters a hard stop in its technological progress, or has its panopticonic meddling intelligence inherited by a random person who chose the same name for their uber-for-spirulina business? Then you need to have a way to pass on the mandate of heaven.
One idea that popped into my head while I was turning over this problem was a code of organizational conduct. This allows the organization to resume after a discontinuity, without granting random people a first-mover advantage at picking up the dropped mantle unless they take it up whole. It's still a simpler problem than human value in general, but it's intermediate between that and "define members of a conventional continuous group of humans". The code has to be something that includes its own decisionmaking process - if six people across the globe adopt a code simultaneously they'll need to resolve conflicts between them just as much as the original organization did. You presumably want to incorporate security features that protect both against garage forks of Projects Turaco and also against ill-intentioned or not-too-bright inheritors of your code.
Other options include:
I don't have a conclusion because I just wrote this about thoughts that I had in response to the meeting, to let other people who can't attend still be in on some of what we're talking and thinking about.
*The number of people who can be hit by a bus before the organization ceases to function