The challenge of friendliness in Artificial Intelligence is to ensure how a general intelligence will be of utility instead of being destructive or pathologically indifferent to the values of existing individuals or aims and goals of their creation. The current provision of computer science is likely to yield bugs and way too technical and inflexible guidelines of action. It is known to be inadequate to handle the job sufficiently. However the challenge of friendliness is also faced by natural intelligences, those that are not designed by an intelligence but molded into being by natural selection.
We know that natural intelligences do the job adequately enough that we do not think that natural intelligence unfriendliness is a significant existential threat. Like plants do solar energy capturing way more efficently and maybe utilising quantum effects that humans can't harness, we know that natural intelligences are using friendliness technology that is of higher caliber that we can build into machines. However as we progress this technology maybe lacking dangerously behind and we need to be able to apply it to hardware in addition to wetware and potentially boost it to new levels.
The earliest concrete example of a natural intelligence being controlled for friendliness I can think of is Socrates. He was charged for "corruption of the heart of the societys youngters". He defended that his stance of questioning everything was without fault. He was however found quilty even thought the trial could be identified with faults. The jury might have been politically motivated or persuaded and the citizens might have expected the results to not be taken seriously. While Socrates was given a very real possibility of escaping imprisonment and capital punishment he did not circumvent his society operation. In fact he was obidient enough that he acted as his own executioner drinking the poison himself. Because of the kind of farce his teachers death had been Plato lost hope for the principles that lead to such an absurd result him becoming skeptical of democrasy.
However if the situation would have been about a artificial intelligence a lot of things went very right. The intelligences society became scared of him and asked it to die. There was dialog about how the deciders were ignorant and stupid and that nothing questionable had been done. However ultimately when issues of miscommunications had been cleared and the society insisted upon its expression of will instead of circumventing the intervention the intelligence pulled its own plug voluntarily. Therefore Socrates was propably the first friendly (natural) intelligence.
The mechanism used in this case was that of a juridical system. That is a human society recognises that certain acts and individuals are worth restraining for the danger that they pose to the common good. A common method is incarcenation and the threat of it. That is certain bad acts can be tolerated in the wild and corrective action can then be employed. When there is reason to expect bad acts or no reason to expect good acts individuals can be restricted in never being able to act in the first place. Whether a criminal is released early can depend on whether there is reason to expect not to be a repeat offender. That is understanding how an agent acts makes it easier to grant operating priviledges. Such hearings are very analogous to a gatekeeper and a AI in a AI-boxing situation.
However when a new human is created it is not assumed hostile until proven friendly. Rather humans are born innocent but powerless. A fully educated and socialised intelligence is assigned for multiple year observation and control period. These so called "parents" have a very wide freedom on programming principles. However human psychology also has peroid of "peer guidedness" where the opinion of peers becomes important. When a youngter grows his thinking is constantly being monitored and things like time of onset of speech are monitored with interest. They also gain guidance on very trivial thinking skills. While this has culture passing effect it also keeps the parent very updated on what is the mental status of the child. Never is a child allowed to grow or reason extended amounts of time isolated. Thus the task of evaluating whether an unknown individual is friendly or not is not encountered. There is never a need to turing-test that a child "works". There is always a maintainer and it has the equivalent of psychological growth logs.
However despite all these measures we know that small children can be cruel and have little empathy. However instead of shelving them as rejects we either accomodate them with an environment that minimises the harm or direct them to a more responcible path. When a child ask a question on how they should approach a particular kind of situation this can be challenging for the parent to answer. The parent might also resort to giving a best-effort answer that might not be entirely satisfactory or even wrong advice may be given. However children have dialog with their parents and other peers.
An interesting question is does parenting break down if the child is intellectually too developed compared to the parent or parenting environment? It's also worth noting that children are not equipped with a "constitution of morality". Some things they infer from experience. Some ethical rules are thougth them explicitly. They learn to apply the rules and interpret them in different situations. Some rules might be contradictory and some moral authorities trusted more.
Beoynd the individual level groups of people have an mechanism of acccepting other groups. This doesn't always happen without conditions. However here things seem to work much less efficently. If two groups of people differ in values enough they might start a war of ideology against each other. This kind of war usually concludes with physical action instead of arguments. Suppression of Nazi Germany can be seen as friendliness immune reaction. Normally divergent values and issues having countries wanted and could unite against a different set of values that was tried to be imposed by force. However the success Nazis had can debatably be attributed for a lousy conclusion of world war I. The effort extended to build peace varies and contests with other values.
Friendliness migth also have an important component that it is relative to a set of values. A society will support the upring of certain kinds of children with the suppression of certain other kinds. USSR had officers that's sole job was to protect that things were going according to party line. At this point we have trouble getting a computer to follow anyones values. However it might be important to ask "friendly to whom?". The exploration of friendliness is also an exploration in hostility. We want to be hostile towards UFAIs. It would be awful for a AI to be friendly only towards it's inventor, or only towards it's company. However we have been hostile to neardentals. Was that wrong? Would it be a signficant loss to developed sentience if AIs were less than friendly to humans?
If we ask our grandgrandgrandparents on how we should conduct things they might give a different version than we have. It's expectable that our children are capable of going beyond our morality. Ensuring that a societys values are never violated would be to freeze them in time indefinately. In this way there can be danger in developing a too friendly AI. For that AI could never be truly superhuman. In a way if my child asks me a morally challenging question and I change my opinion about it by the result of that conversation it might be a friendliness failure. Instead of imparting values I receive them with the values causal history being in the inside of a young head instead of a cultural heritage of a longlived civilization.
As a civilizaton we have mapped a variety of thoughts and psyche- and organizational strucutres on how they work. The thought space on how an AI might think is poorly mapped. However we are spreading our understandig on cognitive diversity learning about how austistic persons think as well as dolphins. We can establish things liek that some savants are really good with dates and that askingn about dates from that kind of person is more realiable than an ordinary person. To be able to use AI thinking we need to understand what AI thought is. Up to now we have not needed to study in detail how humans think. We can just adapt to the way they do without attending to how it works. But in similar that we need to know the structure of a particle accelerator to be able to say that it provides information about particle behaviour we need to know why it would make sense to take what an AI says seriously. The challenge would be the same if we were asked to listen seriously to a natural intelligence from a foreign culture. Thus the enemy is inferential distance itself rather than the resultant thought processes. For we know that we can create things we don't understand. Thus it's important to understand that doing things you don't understand is a recipe for disaster. And we must not fool ourself that we understand what a machine thinking would be. Only once we have convinced our fellow natural intelligences that we know what we are doing can it make sense to listen to our creations. Socrates could not explain himself so his effect on others was unsafe. If you need to influence others you need to explain why you are doing so.