I've had several conversations that went like this:
Victim: But surely a smart artificial intelligence will be able to tell right from wrong, if we humans can do that?
Me: Forget about the word "intelligence" for a moment. Imagine a machine that looks at all actions in turn, and mechanically chooses the action that leads to producing the greatest number of paperclips, in whichever way possible. With enough computing power and enough knowledge about the outside world, the machine might find a way to convert the whole world into a paperclip factory. The machine will resist any attempts by humans to interfere, because the machine's goal function doesn't say anything about humans, only paperclips.
Victim: But such a machine would not be truly intelligent.
Me: Who cares about definitions of words? Humanity can someday find a way to build such a machine, and then we're all screwed.
Victim: ...okay, I see your point. Your machine is not intelligent, but it can be very dangerous because it's super-efficient.
Me (under my breath): Yeah. That's actually my definition of "superintelligent", but you seem to have a concept of "intelligence" that's entangled with many accidental facts about humans, so let's not go there.
How good at playing chess would a chess computer have to be before it started trying to feed the hungry?
A possible intuition may come from computer games. In principle, a game world can be arbitrarily complex, and the goals ("winning conditions") are specified by the programmers, so they can be anything at all. For example, the world may be Medieval Fantasy setting, while the goal may be to invent, craft, take from NPCs by force, and otherwise collect as many paperclips as possible. If an external incentive is provided for people to play the game (e.g., they get real-world money for each paperclip), the players will essentially become paperclip maximizers within the game world.
A related point. I don't think the creators of The Sims, for example, anticipated that perhaps the primary use of their game would be as a sadistic torture simulator. The game explicitly rewards certain actions within the game, namely improving your game-character's status. The game tells you what you should want. But due to our twisted psychology, we get more satisfaction locking our character in a room with no toilet, bed, food or water, with a blaring TV playing all day and night. And then killing him in a fire.
Totally normal people who are not otherwise sadists will play The Sims in this fashion. Playing "Kill Your Character Horribly" is just more fun than playing the game they intended you to play. You get more utility from sadism. An AI with unanticipated internal drives will act in ways that "don't make sense." It will want things we didn't tell it to want.
I think you won't find a very good argument either way, because different ways of building AIs create different constraints on the possible motivations they could have, and we don't know which methods are likely to succeed (or come first) at this point.
For example, uploads would be constrained to have motivations similar to existing humans (plus random drifts or corruptions of such). It seems impossible to create an upload who is motivated solely to fill the universe with paperclips. AIs created by genetic algorithms might be constrained to have certain motivations, which would probably differ from the set of possible motivations of AIs created by simulated biological evolution, etc.
The Orthogonality Thesis (or it's denial) must assume that certain types of AI, e.g., those based on generic optimization algorithms that can accept a wide range of objective functions, are feasible (or not) to build, but I don't think we can safely make such assumptions yet.
ETA: Just noticed Will Newsome's comment, which makes similar points.
What distinguishes the "Orthogonality thesis" from "Hume's Guillotine"? If you're looking for standard published arguments, I'd think you could start with "A Treatise of Human Nature" and proceed through the history of the "is-ought problem" from there.
A lot of the arguments given in these comments amount to: We can imagine a narrow AI that somehow becomes a general intelligence without wireheading or goal distortion, or, We can imagine a specific AGI architecture that is amenable to having precisely defined goals, and because we can imagine them, they're probably possible, and if they're probably possible, then they're probable. But such an argument is very weak. Our intuitions might be wrong, those AIs might not be the first to be developed, they might be theoretically possible but not pragmatically po...
It isn't a definitive argument, but you could point out that various intelligent historical figures had different morals from modern intelligent people. Napoleon, for instance--his intelligence is apparent, but his morality is ambiguous. Newton, or Archimedes, or George Washington, or any of several others, would work similarly.
Accept that moral conceptual truths are possible, and instead argue that an AI would deliberately try to not learn them.
(Note that I do not necessarily agree with what I wrote below. You asked for possible counter-arguments. So here goes.)
Might intelligence imply benevolence?
I believe that a fundamental requirement for any rational agent is the motivation to act maximally intelligently and correctly. That requirement seems even more obvious if we are talking about a conjectured artificial general intelligence (AGI) that is able to improve itself to the point where it is substantially better at most activities than humans. Since if it wouldn't want to be maximally correct th...
One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
How is this a curiosity stopper? It's a good question, as is evidenced by your trying to find an answer to it.
"What should we have the AI's goals be?"
"Eh, just make it self-improve, once it's smart it can figure out the right goals."
It's a curiosity stopper in the sense that people don't worry any more about risks from AI when they assume that intelligence correlates with doing the right thing, and that superintelligence would do the right thing all the time.
Stuart is trying to answer a different question, which is "Given that we think that's probably false, what are some good examples that help people to see its falsity?"
As an example of a fairly powerful optimization process with very unhuman goals, you can cite evolution, which is superhuman in some ways, yet quite amoral.
Possibly somewhat off-topic: my hunch is that the actual motivation of the initial AGI will be random, rather than orthogonal to anything.
Consider this: how often has a difficult task been accomplished right the first time, even with all the careful preparation beforehand? For example, how many rockets blew up, killing people in the process, before the first successful lift-off? People were careless but lucky with the first nuclear reactor, though note "Fermi had convinced Arthur Compton that his calculations were reliable enough to rule out a runawa...
Assuming, from the title, that you're looking for argument by counterexample...
The obvious reply would be to invoke Godwin's Law - there's a quote in Mein Kampf along the lines of "I am convinced that by fighting off the Jews, I am doing the work of our creator...". Comments like this pretty reliably generate a response something like "Hitler was a diseased mind/insane/evil!" to which you may reply "Yeah, but he was pretty sharp, too." However, this has the downside of invoking Nazis, which in a certain kind of person may prov...
When I looked at the puppy, I realized this:
At the moment when you create the AIs, their motivation and intelligence could be independent. But if let them run for a while, some motivations will lead to changes in intelligence. Improving intelligence could be difficult, but I think it is obvious that motivation to self-destruct will on average decrease the intelligence.
So are you talking about orthogonality of motivation and intelligence in freshly created AIs, or in running AIs?
Sure, utility and intelligence might be orthogonal. But very different utilities could still lead to very similar behaviors.
I suspect a self-modifying AI that's cobbled together enough to be willing to mess with its goals will tend towards certain goals. Specifically, I think it would be likely to end up trying to maximize some combination of happiness (probably just its own), knowledge, power, and several other things.
I'd still consider this an argument to work on FAI. Motivation and intelligence don't have to be orthogonal; they just have to not be parallel.
superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation).
Sure they can, but will they?
The weaker "in-theory" orthogonality thesis is probably true, almost trivially, but it doesn't matter much.
We don't care about all possible minds or all possible utility functions for the same reason we don't care about all possible programs. What's actually important is the tiny narrow subset of superintelligences and utility functions that are actually likely to be built and exist in the future.
And ...
Dancy (Real values in a Humean Context, p180) argues that Naturalism provides grounds independant of Humeanism to suspect that moral beliefs need not necessarily motivate.
If I was a strong moral realist, I'd also believe that an AI should be able to just "figure it out". I wonder instead if exposure to the field of AI reserach, where cost functions and methods of solution are pretty orthogonal would help alleviate the moral realism?
All these arguments for the danger of AGI are worthless if the team that creates it doesn't heed the warnings.
I knew about this site for years, but only recently noticed that it has "discussion" (this was before the front page redesign), and that the dangers of AGI are even on-topic here.
Not that I'm about to create an AGI: The team that is will probably be even busier and less willing to be talked down to as in "you need to learn to think", etc.
Just my 2e-2
The argument I tend to default to is, "if there were definitively no fundamental moral values, how would we expect the universe we observe to be different?" If we can't point to any way that moral objectivity constrains our expectations, then it becomes another invisible dragon.
I'd like to see a detailed response to Ben Goertzel's idea that a de novo AGI wouldn't have certain types of goals because they're stupid. I mean, I wouldn't like to read it, because I can't make sense of Goertzel's arguments (if, in fact, they make any), but it'd be good to put it out there since he's prominent for some reason.
Ask what is meant by "the right thing".
Also, (and this may be an additional reason for wanting Friendliness), protecting humanity may not be the right thing in a larger context.
Isn't acting maximally intelligently and correctly itself a motivation? The question you are really asking seems to be why an AI is supposed to act maximally intelligently and correctly to achieve world states that are not explicitly or implicitly defined to maximize expected utility. Yet the motivation to act maximally intelligently and correctly will always be given, otherwise you're not talking about an rational agent.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis.
Any kind of agent could - in principle - be engineered.
However, some sorts of agent are more likely to evolve than others - and it is this case that actually matters to us.
For example, intelligent machines are likely to coevolve in a symbiosis with humans - during which they will pick up some of our values. In this case, intelligence and values will be powerfully...
What it generally was:
AI Researcher: "Fascinating! You should definitely look into this. Fortunately, my own research has no chance of producing a super intelligent AGI, so I'll continue. Good luck son! The government should give you more money."
AI Researcher: "Fascinating! You should definitely look into this. Fortunately, my own research has no chance of producing a super intelligent AGI, so I'll continue. Good luck son! The government should give you more money."
In other words, those researchers estimate the value of friendly AI research as a charitable cause to be the share of their taxes that the government would assign to it if they would even consider it in the first place, which they believe the government should.
It's hard to tell how seriously they really take risks from AI g...
One of the most annoying arguments when discussing AI is the perennial "But if the AI is so smart, why won't it figure out the right thing to do anyway?" It's often the ultimate curiosity stopper.
Nick Bostrom has defined the "Orthogonality thesis" as the principle that motivation and intelligence are essentially unrelated: superintelligences can have nearly any type of motivation (at least, nearly any utility function-bases motivation). We're trying to get some rigorous papers out so that when that question comes up, we can point people to standard, and published, arguments. Nick has had a paper accepted that points out the orthogonality thesis is compatible with a lot of philosophical positions that would seem to contradict it.
I'm hoping to complement this with a paper laying out the positive arguments in favour of the thesis. So I'm asking you for your strongest arguments for (or against) the orthogonality thesis. Think of trying to convince a conservative philosopher who's caught a bad case of moral realism - what would you say to them?
Many thanks! Karma and acknowledgements will shower on the best suggestions, and many puppies will be happy.