I'm not sure how obvious the following is to people, and it probably is obvious to most of the people thinking about FAI. But just thought I'd throw out a summary of it here anyway, since this is the one topic that makes me the most pessimistic about the notion of Friendly AI being possible. At least one based heavily on theory and not plenty of experimentation.
A mind can only represent a complex concept X by embedding it into a tightly intervowen network of other concepts that combine to give X its meaning. For instance, a "cat" is playful, four-legged, feline, a predator, has a tail, and so forth. These are the concepts that define what it means to be a cat; by itself, "cat" is nothing but a complex set of links defining how it relates to these other concepts. (As well as a set of links to memories about cats.) But then, none of those concepts means anything in isolation, either. A "predator" is a specific biological and behavioral class, the members of which hunt other animals for food. Of that definition, "biological" pertains to "biology", which is a "natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy". "Behavior", on the other hand, "refers to the actions of an organism, usually in relation to the environment". Of those words... and so on.
It does not seem likely that humans could preprogram an AI with a ready-made network of concepts. There have been attempts to build knowledge ontologies by hand, but any such attempt is both hopelessly slow and lacking in much of the essential content. Even given a lifetime during which to work and countless of assistants, could you ever hope to code everything you knew into a format from which it was possible to employ that knowledge usefully? Even a worse problem is that the information would need to be in a format compatible with the AI's own learning algorithms, so that any new information the AI learnt would fit seamlessly to the previously-entered database. It does not seem likely that we can come up with an efficient language of thought that can be easily translated into a format that is intuitive for humans to work with.
Indeed, there are existing plans for AI systems which make the explicit assumption that the AI's network of knowledge will develop independently as the system learns, and the concepts in this network won't necessarily have an easy mapping to those used in human language. The OpenCog wikibook states that:
Some ConceptNodes and conceptual PredicateNode or SchemaNodes may correspond with human-language words or phrases like cat, bite, and so forth. This will be the minority case; more such nodes will correspond to parts of human-language concepts or fuzzy collections of human-language concepts. In discussions in this wikibook, however, we will often invoke the unusual case in which Atoms correspond to individual human-language concepts. This is because such examples are the easiest ones to discuss intuitively. The preponderance of named Atoms in the examples in the wikibook implies no similar preponderance of named Atoms in the real OpenCog system. It is merely easier to talk about a hypothetical Atom named "cat" than it is about a hypothetical Atom (internally) named . It is not impossible that a OpenCog system represents "cat" as a single ConceptNode, but it is just as likely that it will represent "cat" as a map composed of many different nodes without any of these having natural names. Each OpenCog works out for itself, implicitly, which concepts to represent as single Atoms and which in distributed fashion.
Designers of Friendly AI seek to build a machine with a clearly-defined goal system, one which is guaranteed to preserve the highly complex values that humans have. But the nature of concepts poses a challenge for this objective. There seems to be no obvious way of programming those highly complex goals into the AI right from the beginning, nor to guarantee that any goals thus preprogrammed will not end up being drastically reinterpreted as the system learns. We cannot simply code "safeguard these human values" into the AI's utility function without defining those values in detail, and defining those values in detail requires us to build the AI with an entire knowledge network. On a certain conceptual level, the decision theory and goal system of an AI is separate from its knowledge base; in practice, it doesn't seem like this would be possible.
The goal might not be impossible, though. Humans do seem to be pre-programmed with inclinations towards various complex behaviors which might suggest pre-programmed concepts to various degrees. Heterosexuality is considerably more common in the population than homosexuality, though this may have relatively simple causes such as an inborn preference towards particular body shapes combined with social conditioning. (Disclaimer: I don't really know anything about the biology of sexuality, so I'm speculating wildly here.) Most people also seem to react relatively consistently to different status displays, and people have collected various lists of complex human universals. The exact method of their transmission remains unknown, however, as does the role that culture serves in it. It also bears noting that most so-called "human universals" are actually cultural as opposed to individual universals. In other words, any given culture might be guaranteed to express them, but there will always be individuals who don't fit into the usual norms.
See also: Vladimir Nesov discusses a closely related form of this problem as the "ontology problem".