People who think that risks from AI is the category of dangers that is most likely to be the cause of a loss of all human value in the universe often argue that artificial general intelligence tends to undergo recursive self-improvement. The reason for doing so is that intelligence is maximally instrumentally useful in the realization of almost any terminal goal an AI might be equipped with. They believe that intelligence is an universal instrumental value. This sounds convincing, so let's accept it as given.
What kind of instrumental value is general intelligence, what is it good for? Personally I try to see general intelligence purely as a potential. It allows an agent to achieve its goals.
The question that is not asked is why an artificial agent would tap the full potential of its general intelligence rather than only use the amount it is "told" to use, where would the incentive to do more come from?
If you deprived a human infant of all its evolutionary drives (e.g. to avoid pain, seek nutrition, status and - later on - sex), would it just grow into an adult that might try to become rich or rule a country? No, it would have no incentive to do so. Even though such a "blank slate" would have the same potential for general intelligence, it wouldn't use it.
Say you came up with the most basic template for general intelligence that works given limited resources. If you wanted to apply this potential to improve your template, would this be a sufficient condition for it to take over the world? I don't think so. If you didn't explicitly told it to do so, why would it?
The crux of the matter is that a goal isn't enough to enable the full potential of general intelligence, you also need to explicitly define how to achieve that goal. General intelligence does not imply recursive self-improvement, just the potential to do so, but not the incentive. The incentive has to be given, it is not implied by general intelligence.
For the same reasons that I don't think that an AGI will be automatically friendly, I don't think that it will automatically undergo recursive self-improvement. Maximizing expected utility is, just like friendliness, something that needs to be explicitly defined, otherwise there will be no incentive to do so.
For example, in what sense would it be wrong for a general intelligence to maximize paperclips in the universe by waiting for them to arise due to random fluctuations out of a state of chaos? It is not inherently stupid to desire that, there is no law of nature that prohibits certain goals.
Why would an generally intelligent artificial agent care about how to reach its goals if the preferred way is undefined? It is not intelligent to do something as quickly or effectively as possible if doing so is not desired. And an artificial agent doesn't desire anything that it isn't made to desire.
There exists an interesting idiom stating that the journey is the reward. Humans know that it takes a journey to reach a goal and that the journey can be a goal in and of itself. For an artificial agent there is no difference between a goal and how to reach it. If you told it to reach Africa but not how, it might as well wait until it reaches Africa by means of continental drift. Would that be stupid? Only for humans, the AI has infinite patience, it just doesn't care about any implicit connotations.
The relevant question is not "will an AGI automatically undergo recursive self-improvement", but "how likely is it that at least one of the early AGIs undergos recursive self-improvement". If one AGI ends up FOOMing and taking over the world, the fact that there were 999 others which didn't is relatively uninteresting.
Presuming that the AGI has been built by humans to solve problems, then it has presumably also been built to care about the time it takes to reach its goals.
That's true, and we need an organisation like the SIAI to take care of that issue. But I still have a perception of harsh overconfidence around here when it comes issues related to risks from AI. It is not clear to me that dangerous recursive self-improvement is easier to achieve than friendliness.
To destroy is easier than to create. But destroying human values by means of unbounded recursive self-improvement seems to me to be one of the most complex existential risks.
The usual difference that is being highlighted around here is how easy it is to create simple goals versus complex goals, e.g. creating paperclips versus the protection of human values. But recursive self-improvement is a goal in and of itself. An artificial agent does not discern between a destination and the route to reach it, it has to be defined in terms of the AI's optimization parameters. It doesn't just happen, it is something very complex that needs to be explicitly defined.
So how likely is it? You need an AGI that is, in my opinion, explicitly defined and capable of unbounded and uncontrollable recursive self-improvement. There need to be internal causation's that prompt it to keep going in the face of countless undefined challenges.
Something that could take over the world seems to me to be the endpoint of a very long and slow route towards a thorough understanding of many different fields, nothing that one could stumble upon early on and by accident.
The conservative assumption is that AGI is easy, and FAI is hard.
I don't know if this is actually true. I think FAI is harder than AGI, but I'm very much not a specialist in the area - either area. However, I do know that I'd very much rather overshoot the required safety margin by a mile than undershoot by a meter.
"FAI" here generally means "Friendly AGI", which would make "FAI is harder than AGI" trivially true.
Perhaps you meant one of the following more interesting propositions:
(Assuming even the sub-problem of Friendliness still has prerequisite part or all of AGI, the latter proposition implies "Friendliness isn't so easy relative to AGI such that progress on Friendliness will lag insignificantly behind progress on AGI.")
Why is writing with this kind of topics in the discussion section rather than in main, while we have relationship advice in the main section but not here?
I think the far view on those thought patterns is that they are indicative of raising the issue of possible false understanding rather than providing a coherent new understanding. That's for discussion.
Not really. If you don't specify how, it will just choose one of the available ways.
For example, a chess program just tries to mate you quickly. Programmers typically don't tell it how to do that, just what the goal is. Intelligent agents can figure out the details for themselves.
Your title argues against universal instrumental values - but IMO, you don't have much of a case.
There needs to be some metric by which it can measure the available ways, as long as you don't build it to choose one randomly. So if it doesn't act randomly, why exactly would the favorable option be to act by consuming the whole world to improve its intelligence? Recursive self-improvement is a resource that can be used, not a mandatory way of accomplishing goals. There is nothing fundamentally rational about achieving goals efficiently and quickly. An artificial agent simply doesn't care if you don't make it care.
My case is that any instrumental values are relative, even intelligence and goal-preservation. An artificial agent simply doesn't care not to die whatever it takes, to act as smart and fast as possible or to achieve any given goal economically.
If the AI is a maximizer rather than satisficer, then it will likely have a method for measuring the quality of it's paths to achieving optimization that can be derived from it's utility function and it's model of the world. So the question isn't whether it will be able to choose a path, but instead is: Is it more likely to choose a path where it sits around risking its own destruction or more likely to get started protecting things that share its goal (including itself) and acheiving some of its subgoals.
Also, if the AI is a satisficer then maybe that would increase its odds of sitting around waiting for continents to drift, but maybe not.
It doesn't need to be random. Can can be merely arbitrary.
So, thinking about the chess program, if the program has two ways to mate in three (and they have the same utility internally) it doesn't nomally bother to use a random number generator - it just chooses the first one it found, the last one it found, or something like that. The details might depend on the move generation algorithm, or the tree pruning algorithm.
The point is that it still picks one that works, without the original programmer wiring preferences relating to the issue into its utility function.
Sure - though humans often want speed and efficiency - so this is one of the very first preferences they tell their machines about. This seems like a side issue.
Speed arises out of discounting, which is ubiquitous, if not universal. Economy is not really much of an universal instrumental value - just something many people care about. I suppose there are some instrumental reasons for caring about economy - but it is not a great example. Not dying is a universal instrumental value, though - if you die it diminishes your control over the future. A wide range of agents can be expected to strive to avoid dying.
Just to highlight my point, here is a question nobody can answer right now. At what level of general intelligence would a chess program start the unbounded optimization of its chess skills? I don't think that there is a point where a chess program would unexpectedly take over the world to refine its abilities. You will have to explicitly cause it to do so, it won't happen as an unexpected implication of a much simpler algorithm. At least not if it works given limited resources.
Yes, yet most of our machines are defined to work under certain spatio-temporal scope boundaries and resource limitations. I am not saying that humans won't try to make their machines as smart as possible, I am objecting to the idea that it is the implicit result of most AGI designs. I perceive dangerous recursive self-improvement as a natural implication of general intelligence to be as unlikely as an AGI that is automatically friendly.
Causing an artificial general intelligence to consume the world, in order to improve itself, seems to be as hard as making it care about humans. Both concepts seem very natural to agents like us, agents that are the effect of natural selection, that wouldn't exist if they didn't win a lot of fitness competitions in the past. But artificial agents lack all of that vast amount of causes that prompt us to do what we do.
This is a concept that needs to be made explicit in every detail. We know what it means to die, an artificial agent won't. Does it die if it stops computing? Does it die if it changes its substrate?
There is a huge amount of concepts that we are still unable to describe mathematically. Recursive self-improvement might sound intuitively appealing, but it is nothing that will just happen. Just like friendliness, it takes explicit, mathematically precise definitions to cause an artificial agent to undergo recursive self-improvement.
So, death may have some subtleties, but essentially it involves permanent and drastic loss of function - so cars die, computer systems die. buildings die - etc. For software, we are talking about this.
You have mostly answered it yourself. Never. Or until a motivation for doing so is provided by some external agent. The biological evolution filled our brains with the intelligence AND a will to do such things as not to only win a chess game, but to use the whole Moon to get enough computing power to be nearly an optimal chess player.
Power without control is nothing. Intelligence without motives is also nothing, in that sense.
Machine autocatalysis is already happening. That's the point of my The Intelligence Explosion Is Happening Now essay. Whatever tech is needed to result in self-improvement is already out there - and the ball is rolling. What happens next is that the man-machine civilisation becomes more machine. That's the well-known process of automation. The whole process is already self-improving, and it has been since the first living thing.
Self-improvement is not really something where we get to decide whether to build it in.
Well already technological progress is acting in an autocatalytic fashion. Progress is fast, and numerous people are losing their jobs and suffering as a result. It seems likely that progress will get faster, and even more people will be affected by this kind of future shock.
We see autocatalytic improvements in technology taking place today - and they seem likely to be more common in the future.
Climbing the Tower of optimisation is not inevitable, but it looks as though it would take a totalitarian government to slow progress down.
Well, there's a sense in which "most" bridges collapse, "most" ships sink and "most" planes crash.
That sense is not very useful in practice - the actual behaviour of engineered structures depends on a whole bunch of sociological considerations. If yopu want to see whether engineering projects will kill people, you have to look into those issues - because a "counting" argument tells you practically nothing of interest.
Which considerations you believe distinguish AI doing something because it's told to, from it doing something when not told to? In what way is "talking to AI" a special kind of event in the world that fully controls AI's effect?
I am comparing the arguments that are being made against implied friendliness to the arguments that are being made about implied doom. Friendliness clearly isn't something we should expect an artificial general intelligence to exhibit if it isn't explicitly designed to be friendly. But why then would that be the case for other complex behaviors like recursive self-improvement?
I don't doubt that it is possible to design an AGI that tends to undergo recursive self-improvement, I am just questioning that most AGI designs do so, that it is a basic AI drive.
I believe that very complex behaviors (e.g. taking over the world) are unlikely to be the result of unforeseen implications. If it is explicitly designed (told) to do so, then it will do so. If it isn't explicitly designed to undergo unbounded self-improvement, then it won't happen as a side-effect either.
That is not to say that there are no information theoretically simple algorithms exhibiting extremely complex behaviors, but none that work given limited resources.
We should of course take the possibility seriously and take preemptive measures, just like the SIAI is doing it. But I don't see that the available evidence allows us to believe that the first simple AGI's will undergo unbounded and uncontrollable self-improvement. What led you to believe that?
Most AGI researchers don't seem to share the opinion that the first AGI will take over the world on its own. Ben Goertzel calls it the scary idea. Shane Legg writes that "it's more likely that it will be the team of humans who understand how it works that will scale it up to something significantly super human, rather than the machine itself."
I think there is some difficulty here with saying "an AI" or "an optimisation process"; these are both very large search spaces. I suggest that it is more fruitful to consider AIs as a subset of programs that humans will make. Coding is hard work; nobody does it without some purpose in mind. The problem is to compactly specify our real purpose so that the computer's output will embody it, noting that there are very many layers of indirection.
Trying to solve any given problem, say poverty, by building an AI is like trying to build a LEGO pattern using waldoes, which you manipulate through a command-line interface, which takes naturalish-language input in Russian, while looking at the actual LEGO blocks through several mirrors some of which are not flat. The connection between input and output is non-obvious.
Now this is also true of any other problem you might solve on a computer, for if the connection were obvious you would not use a computer. (In chess, for example, the connection between the first move and the final mate is anything but obvious to a human brain, but is presumably deterministic. Hence Deep Blue.) The difference is that poverty is a very large and not-easily-formalisable problem.
In trying to understand AI drives, then, I suggest that it is more fruitful to think about what sort of self-improvement algorithm a human might write, and what its failure modes are. An optimisation process picked at random from the space of all possible programs, even all working programs, might not have a drive to optimisation; but an AI built by a human may well have, at a minimum, a function labeled "check-for-optimisations", whose intention is to write machine code that has the same output in fewer operations. But between intention and execution are many steps; if the "prove-output-identical" method has a bug, you have a problem.
The AI that waits to reach Africa by continental drift certainly exists somewhere in the space of possible programs to reach Africa - it is indistinguishable from the null program, in fact, which any fool can write and many do. But I suggest that it is not very relevant to a discussion of programs written by smart people with a particular goal in mind, who intend to put in a "look-for-better-options" function somewhere.
An Artificial Intelligence with no drives is a contradiction in terms. It wouldn't be intelligent. It would be a brick.
-- Creatures Wiki
I am not saying that the most likely result of AGI research are AI's with no drives. I am saying that taking over the world is a very complex business that doesn't just happen as a side-effect or unforeseen implication of some information theoretically simple AGI design. Pretty much nothing that actually works efficiently happens as an unforeseen side-effect.
Taking over the world is instrumentally useful, regardless of your goals. If it has drives (and sufficient intelligence), it will take over the world. If it doesn't, it's not AI.
This isn't something that just happens while it tries to accomplish its goal in another manner. It's a subgoal. It will accomplish its subgoals efficiently, and it's entirely possible for someone to miss one.
Only if there isn't some mutually exclusive subgoal that more efficiently achieves its goals. It may turn out that taking over the Earth isn't the most efficient way to tile interstellar space with paperclips, for example, in which case a good enough optimizer will forego taking over the Earth in favor of the more efficient way to achieve its goals.
Of course, that's not something to count on.
You normally have to at least arrest any nearby agents that might potentially thwart your own efforts.
It's possible that it won't take over Earth fist. It would likely start with asteroids to save on shipping, and if FTL is cheap enough, it might use other stars' asteroids before it resorts to the planets. But it will resort to the planets eventually.
If you gave it an infinite planning horizon with negligible discounting and just told it to get to Africa, the first steps are likely to be making sure that both it and Africa exist, by setting up a sophisticated defense system - protecting itself and protecting Africa from threats like meteorite strikes. Going to Africa is likely to be a consequence of these initial steps.
Things like meteorite defense systems are consequences of instrumental goals. You didn't build it into the utility function - but it happened anyway.
When you create a program, it is not enough to say what should it achieve. You must specify how to do it.
You can't just create a program by saying "maximize f(x)", even if you give it a perfect definition of f(x). You must also provide a method, for example "try 1000 random values of x and remember the best result" or "keep trying and remembering the best result until I press Enter" or maybe something more complex, like "remember 10 best results, and try random values so that you more often choose numbers similar to these best known results". You must provide some strategy.
Perhaps in some environments you don't, because the strategy was already put there by the authors of the environment. But someone had to specify it. The strategy may remember some values and use them later so it kind of learns. But even the first version of this learning strategy was written by someone.
So what does it mean to have an "artificial agent that has a goal"? It is an incomplete description. The agent must also have a strategy, otherwise it won't move.
Therefore, a precise question would be more like: "what kinds of initial strategies lead (in favorable conditions) toward developing a general intelligence?" Then we should specify what counts as reasonably favorable conditions, and what is outright cheating. (An agent with strategy "find the nearest data disk, remove your old program and read new program from this disk" could develop a general intelligence if it finds a disk with general intelligence program, but I guess that considers cheating. Although, humans also learn from others, so where exactly is the line between "learning with help" and "just copying"?)
I think a certain sort of urgency comes naturally from the finite lifespan of all things. So let's say we have an AI with a 0.00001% chance of malfunctioning or being destroyed per year, that gets 100 utility if it is in Africa. Should it go now, or should it wait a year? Well, all things being equal, the expected utility of the "go now" strategy is veeeery slightly larger than the expected utility of waiting a year. So an expected utility maximizer with any sort of risk of death would choose to act sooner rather than later.
Yes, a full-blown expected utility maximizer, with a utility-function enclosing goals with detailed enough optimization parameters to make useful utility calculations about real-world causal relationships relative to its own self-perception. I think that before something like that becomes possible, some other less sophisticated intelligence will have already been employed as a tool to do, or solve, something that destroys pretty much everything. An AI that can solve bio or nanotech problems should be much easier to design than one that can destroy the world as a side-effect of unbounded self-improvement. And only the latter category is subject to friendliness research.
I don't doubt that the kind of intelligence you have in mind has certain drives that cause it to take certain actions. But that's not what I would call "basic AI drives", rather a certain kind of sophisticated AGI design that is unlikely to come into existence as a result of ignorance or unpredictable implications of a design someone stumbled upon by luck.
Well, okay. Restricting the validity of the "AI drives" to general AIs seems reasonable. After all, the local traffic lights have not started trying to refine their goals, and PageRank has not yet started manipulating society to eliminate threats to its well-being.
So far as we know.
Hmm. Possibly some research avenues are more promising than others - but this sounds a bit broad.
Once again I strongly suggest that we taboo the word "goals" in this type of discussion. Or at least specify and stick to a particular technical definition for "goals," so that we can separate "goalish behaviors not explicitly specified by a utility function" from "explicit goals-as-defined." The salient issues in a discussion of hypothetical AI behaviors can and should be discussed without imbueing the AI with these very, very human abstract qualities.
As a further example of what I mean, this is the algorithm we are fond of executing. The following symbol indicates "approximate, lossy mapping:" =>
( "Physical human brain made of neurons" ) => ( "Conceptual messy web of biological drives and instrumental goals" ) => ( "Utility function containing discrete objects called 'goals' and 'values'" )
My point is that goals are not ontologically fundamental. This algorithm/mapping is not something that happens physically. It is just how we like to simplify things in order to model ourselves and each other. We automatically extend this mapping to all intelligent agents, even when it is not appropriate.
So try this:
( "Physical computer running a program" ) => ( "Software algorithm interacting in complex ways with its environment" ) => ( "Goal directed agent" )
When the system we are talking about is Microsoft Word, most people would say that applying this algorithm/mapping is inappropriate and confusing. And yet Microsoft Word can still do things that its user and even its programmer wouldn't necessarily expect it to in the process of carrying out simple instructions.
In a nutshell, this is the statement I most strongly disagree with. "Desire" does not exist. To imply that intelligent agents can only do things that are explicitly defined, while creativity is implied by intelligence, is to actually reverse the algorithm/mapping defined above. It is the misapplication of a lossy abstraction.
In the case of an artificial agent, intelligence is explicitly defined. The agent does whatever is either explicitly defined, or what is implied by it. In the case of a creative, goal-oriented and generally intelligent agent this can be anything from doing nothing at all up to unbounded recursive self-improvement. But does general intelligence, in and of itself, evoke certain kinds of behavior? I don't think so. You can be the smartest being around, if there are no internal causation's that prompt the use of the full capacity of your potential then you won't use it.
I would love to see more technical discussions on the likely behavior of hyphothetical AI's.
The recursive improvement still could occur on its own. If an AI thought that the optimal path to accomplishing a goal was to have a greater level of intelligence, which seems like it would be a fairly common situation, then the AI would start improving itself.
Similarly, if an AI thinks it could accomplish a task better if it had more resources, and decided that taking over the world was the best way to have access to those resources, then it would do so.
Accomplish a task better? The best way to access those resources? How does it decide what is better or best if you don't tell it what it should do? What you want the AI to do could as well be to produce paperclips as slowly as possible and let them be consumed by humans. What would be better and best in this context and why would the AI decide that it means to take over the universe to figure that out? Why would it care to refine its goals, why would it care about efficiency or speed when those characteristics might or might not be part of its goals?
An artificial agent doesn't have drives of any sort, it wouldn't mind to be destroyed if you forgot to tell it what it means not to be destroyed and that it should care about it.
As slowly as possible? That is 0 paperclips per second.
That is a do-nothing agent, not an intelligent agent. So: it makes a dreadful example.
As slowly as possible to meet demand with some specific confidence?
Maybe. At the moment, it seems rather underspecified - so there doesn't seem to be much point in trying to predict its actions. If it just makes a few paperclips, in what respect is it a powerful superintelligence - rather than just a boring steel paperclip manufacturer?
Well, I would assume that if someone designed an AI with goals, a preference for those goals being accomplished faster would also be included. And for the difficult problems that we would build an AI to solve, there is a non-negligible probability that the AI will decide that it could solve a problem faster with more resources.
Change to "would be most likely to be the cause of a loss of all human value in the universe unless things are done to prevent it" or similar to avoid fatalism.
Because you told it to improve its general intelligence, assuming "If you wanted to apply this potential to improve your template," means that your desire impelled you to to tell the machine to try and improve its template.