So "good" creatures have a mechanism which simulates the thoughts and feelings of others, making it have similar thoughts and feelings, whether they are pleasant or bad. (Well, we have a "but this is the Enemy" mode, some others could have a "but now it's time to begin making paperclips at last" mode...)
For me, feeling the same seems to be much more important. (See dogs, infants...) So thinking in AI terms, there must be a coupling between the creature's utility function and ours. It wants us to be happy in order to be happy itself. (Wireheading us is not sufficient, because the model of us in its head would feel bad about it, unchanged in the process... it's some weak form of CEV.)
So is an AI sympathetic if it has this coupling in its utility function? And with whose utilities? Humans? Sentient beings? Anything with an utility function? Chess machines? (Losing makes them really really sad...) Or what about rocks? Utility functions are just a way to predict some parts of the world, after all...
My point is that a definition of sympathy also needs a function to determine who or what to feel sympathy for. For us, this seems to be "everyone who looks like a living creature or acts like one", but it's complicated in the same way as our values. Accepting "sympathy" and "personlike" for the definition of "friendly" could be easily turtles all the way down.
What's the meaning of "consciousness", "sentient" and "person" at all? It seems to me that all these concepts (at least partially) refer to the Ultimate Power, the smaller, imperfect echo of the universe. We've given our computers all the Powers except this: they can see, hear, communicate, but still...
For understanding my words, you must have a model of me, in addition to the model of our surroundings. Not just an abstract mathematical one but something which includes what I'm thinking right now. (Why should we call something a "superintelligence" if it doesn't even grasp what I'm telling to it?)
Isn't "personhood" a mixture of godshatter (like morality) and power estimation? Isn't it like asking "do we have free will"? Not every messy spot on our map corresponds to some undiscovered territory. Maybe it's just like a blegg .
Doug S.: if it were 20 lines of lisp... it is'nt, see http://xkcd.com/224/ :)
Furthermore... it seems to me that a FAI which creates a nice world for us needs the whole human value system AND its coherent extrapolation. And knowing how complicated the human value system is, I'm not sure we can accomplish even the former task. So what about creating a "safety net" AI instead? Let's upload everyone who is dying or suffering too much, create advanced tools for us to use, but otherwise preserve everything until we come up with a better solution. This would fit into 20 lines, "be nice" wouldn't.
That looks so... dim. (But sadly, it sounds too true.) So I ask too: what to do next? Hack AI and... become "death, destroyer of worlds"? Or think about FAI without doing anything specific? And doing that not just using that "just for fun" curiosity, which is needed (or so it seems) for every big scientific discovery. (Or is it just me who thinks it that way?)
Anyway... Do we have any information about what the human brain is capable of without additional downloaded "software"? (Or has the co-evolution of the brain and the "software" played such an important role that certain parts of it need some "drivers" to be useful at all?)
Programmers are also supposed to search the space of Turing machines, which seems really hard. Programming in Brainfuck is hard. All the software written in higher level languages are points of a mere subspace... If optimizing in this subspace has proven to be so effective, I don't think we have a reason to worry about uncompressible subspaces containing the only working solution for our problems, namely more intelligent AI designs.
Analogy might work better for recognizing things already optimized in design space, especially if they are a product of evolution, with common ancestors (4 legs, looks like a lion, so run, even if it has stripes). And we only started designing complicated stuff a few thousand years ago at most...
"looking for reflective equilibria of your current inconsistent and unknowledgeable self; something along the lines of 'What would you ask me to do if you knew what I know and thought as fast as I do?'"
We're sufficiently more intelligent than monkeys to do that reasoning... so humanity's goal (as the advanced intelligence created by monkeys a few million years ago for getting to the Singularity) should be to use all the knowledge gained to tile the universe with bananas and forests etc.
We don't have the right to say, "if monkeys were more intelligent and consistent, they would think like us": we're also a random product of evolution, from the point of view of monkeys. (Tile the world with ugly concrete buildings? Uhhh...)
So I think that to preserve our humanity in the process we should be the ones who become gradually more and more intelligent (and decide what goals to follow next). Humans are complicated, so to simulate it in a Friendly AI, we'd need comparably complex systems... and they are probably chaotic, too. Isn't it... simply... impossible? (Not in a sense that "we can't make it", but "we can prove nobody can"...)
"I think therefore I am"... So there is a little billiard ball in some model which is me, and it has a relatively stable existence in time. Can't you imagine a world in which these concepts simply make no sense? (If you couldn't, just look around, QM, GR...)
Unknown, for the fourth: yes, even highest level desires change by time, but not because we want them to be changed. I think the third one is false instead: doing what you don't want to do is a flaw in the integrity of the cognitive system, a result of that we can't reprogram our lower level desires, but what desire could drive us to reprogram our highest level ones?
There is a subsystem in our brains called "conscience". We learn what is right and what is wrong in our early years, perhaps with certain priors ("causing harm to others is bad"). These things can also change by time (slowly!) per person, for example if the context of the feelings dramatically changes (oops, there is no God).
So agreeing with Subhan, I think we just do what we "want", maximizing the good feelings generated by our decisions. We ("we" = the optimization process trying to accomplish that) don't have access to the lower level (on/off switch of conscience), so in many cases the best solution is to avoid doing "bad" things. (And it really feels different a) to want something because we like it b) to want something to avoid the bad feelings generated by conscience). What our thoughts can't control directly seems to be an objective, higher level truth, that's the algorithm feels from the inside.
Furthermore, see psychopaths. They don't seem to have the same mental machinery of conscience, so the utility of their harmful intentions don't get the same correction factor. And so immoral they become.