I've had several conversations that went like this:
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: Forget about the word "intelligence" for a moment. Imagine a machine that looks at all actions in turn, and mechanically chooses the action that leads to producing the greatest number of paperclips, in whichever way possible. With enough computing power and enough knowledge about the outside world, the machine might find a way to convert the whole world into a paperclip factory. The machine will resist any attempts by humans to interfere, because the machine's goal function doesn't say anything about humans, only paperclips.
Victim: But such a machine would not be truly intelligent.
Me: Who cares about definitions of words? Humanity can someday find a way to build such a machine, and then we're all screwed.
Victim: ...okay, I see your point. Your machine is not intelligent, but it can be very dangerous because it's super-efficient.
Me (under my breath): Yeah. That's actually my definition of "superintelligent", but you seem to have a concept of "intelligence" that's entangled with many accidental facts about humans, so let's not go there.
A possible intuition may come from computer games. In principle, a game world can be arbitrarily complex, and the goals ("winning conditions") are specified by the programmers, so they can be anything at all. For example, the world may be Medieval Fantasy setting, while the goal may be to invent, craft, take from NPCs by force, and otherwise collect as many paperclips as possible. If an external incentive is provided for people to play the game (e.g., they get real-world money for each paperclip), the players will essentially become paperclip maximizers within the game world.
A related point. I don't think the creators of The Sims, for example, anticipated that perhaps the primary use of their game would be as a sadistic torture simulator. The game explicitly rewards certain actions within the game, namely improving your game-character's status. The game tells you what you should want. But due to our twisted psychology, we get more satisfaction locking our character in a room with no toilet, bed, food or water, with a blaring TV playing all day and night. And then killing him in a fire.
Totally normal people who are not otherwise sadists will play The Sims in this fashion. Playing "Kill Your Character Horribly" is just more fun than playing the game they intended you to play. You get more utility from sadism. An AI with unanticipated internal drives will act in ways that "don't make sense." It will want things we didn't tell it to want.
I think you won't find a very good argument either way, because different ways of building AIs create different constraints on the possible motivations they could have, and we don't know which methods are likely to succeed (or come first) at this point.
For example, uploads would be constrained to have motivations similar to existing humans (plus random drifts or corruptions of such). It seems impossible to create an upload who is motivated solely to fill the universe with paperclips. AIs created by genetic algorithms might be constrained to have certain m...
A lot of the arguments given in these comments amount to: We can imagine a narrow AI that somehow becomes a general intelligence without wireheading or goal distortion, or, We can imagine a specific AGI architecture that is amenable to having precisely defined goals, and because we can imagine them, they're probably possible, and if they're probably possible, then they're probable. But such an argument is very weak. Our intuitions might be wrong, those AIs might not be the first to be developed, they might be theoretically possible but not pragmatically po...
It isn't a definitive argument, but you could point out that various intelligent historical figures had different morals from modern intelligent people. Napoleon, for instance--his intelligence is apparent, but his morality is ambiguous. Newton, or Archimedes, or George Washington, or any of several others, would work similarly.
(Note that I do not necessarily agree with what I wrote below. You asked for possible counter-arguments. So here goes.)
Might intelligence imply benevolence?
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn't want to be maximally correct th...
One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
How is this a curiosity stopper? It's a good question, as is evidenced by your trying to find an answer to it.
It's a curiosity stopper in the sense that people don't worry any more about risks from AI when they assume that intelligence correlates with doing the right thing, and that superintelligence would do the right thing all the time.
Stuart is trying to answer a different question, which is "Given that we think that's probably false, what are some good examples that help people to see its falsity?"
Possibly somewhat off-topic: my hunch is that the actual motivation of the initial AGI will be random, rather than orthogonal to anything.
Consider this: how often has a difficult task been accomplished right the first time, even with all the careful preparation beforehand? For example, how many rockets blew up, killing people in the process, before the first successful lift-off? People were careless but lucky with the first nuclear reactor, though note "Fermi had convinced Arthur Compton that his calculations were reliable enough to rule out a runawa...
Assuming, from the title, that you're looking for argument by counterexample...
The obvious reply would be to invoke Godwin's Law - there's a quote in Mein Kampf along the lines of "I am convinced that by fighting off the Jews, I am doing the work of our creator...". Comments like this pretty reliably generate a response something like "Hitler was a diseased mind/insane/evil!" to which you may reply "Yeah, but he was pretty sharp, too." However, this has the downside of invoking Nazis, which in a certain kind of person may prov...
When I looked at the puppy, I realized this:
At the moment when you create the AIs, their motivation and intelligence could be independent. But if let them run for a while, some motivations will lead to changes in intelligence. Improving intelligence could be difficult, but I think it is obvious that motivation to self-destruct will on average decrease the intelligence.
So are you talking about orthogonality of motivation and intelligence in freshly created AIs, or in running AIs?
I suspect a self-modifying AI that's cobbled together enough to be willing to mess with its goals will tend towards certain goals. Specifically, I think it would be likely to end up trying to maximize some combination of happiness (probably just its own), knowledge, power, and several other things.
I'd still consider this an argument to work on FAI. Motivation and intelligence don't have to be orthogonal; they just have to not be parallel.
superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation).
Sure they can, but will they?
The weaker "in-theory" orthogonality thesis is probably true, almost trivially, but it doesn't matter much.
We don't care about all possible minds or all possible utility functions for the same reason we don't care about all possible programs. What's actually important is the tiny narrow subset of superintelligences and utility functions that are actually likely to be built and exist in the future.
And ...
All these arguments for the danger of AGI are worthless if the team that creates it doesn't heed the warnings.
I knew about this site for years, but only recently noticed that it has "discussion" (this was before the front page redesign), and that the dangers of AGI are even on-topic here.
Not that I'm about to create an AGI: The team that is will probably be even busier and less willing to be talked down to as in "you need to learn to think", etc.
Just my 2e-2
I'd like to see a detailed response to Ben Goertzel's idea that a de novo AGI wouldn't have certain types of goals because they're stupid. I mean, I wouldn't like to read it, because I can't make sense of Goertzel's arguments (if, in fact, they make any), but it'd be good to put it out there since he's prominent for some reason.
Isn't acting maximally intelligently and correctly itself a motivation? The question you are really asking seems to be why an AI is supposed to act maximally intelligently and correctly to achieve world states that are not explicitly or implicitly defined to maximize expected utility. Yet the motivation to act maximally intelligently and correctly will always be given, otherwise you're not talking about an rational agent.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis.
Any kind of agent could - in principle - be engineered.
However, some sorts of agent are more likely to evolve than others - and it is this case that actually matters to us.
For example, intelligent machines are likely to coevolve in a symbiosis with humans - during which they will pick up some of our values. In this case, intelligence and values will be powerfully...
One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.