Please forgive me for being awkward on my first post, I just wanted to get these ideas out there to facilitate discussion on the topic. I also am sorry if I use any newly-discovered terminology incorrectly, but I am doing my best.
Introduction: Utilitarianism is Unaligned
Intuitively, I think that an important component of modeling a moral system for friendly AGI is to discuss what ethical philosophy such a system would be built off of.
From my experience, it seems that a lot of people who discuss a hypothetical moral system for friendly AI presumes that such a system would work on a utilitarian basis. For an array of generated policies, each policy is represented by some metric that represents the collective good of humanity or civilization, and the AI selects the policy where this value is maximized.
I feel that an approach like that tends to be naïve, and isn't an expectation we actually hold for humans in the real world, much less what we expect from an AI. From what I understand, Utilitarianism has been criticized for as long as it has been formally stated, despite such famous adages like "the needs of the many outweigh the needs of the few". Such thought experiments as the Trolley Problem are meant to give us the impression that killing n people is permissible if it allows n+1 people to survive, but this doesn't align with our moral conscience. This also doesn't work for a superintelligent AI, because a policy that will sacrifice 49% of the population for the benefit of 51% of the population should be totally unacceptable.
In reality, we want a moral system that says killing people is always wrong, at any time, for any reason, and under any circumstance. No matter how cosmically intelligent the AI is, there is no scenario where such a sacrifice is appropriate. Thus, a mathematical model based in Utilitarianism, while making sense on paper, is ultimately unaligned from our actual intentions in terms of morality.
Deontology as an Ethical Foundation for FAI
Now, I am aware there are more ethical systems out there besides Utilitarianism and Deontology, but I just wanted to outline here some thoughts I had on how utility functions based on a Deontological model could be constructed. I haven't seen anyone else on the site use this approach, but if there is any existing literature out there that discusses Deontological models for friendly AI, I would really like to read more about it.
An AGI that uses a Deontological model for morality is very different from simply slapping on a set of rules that the agent can or cannot do. Rather, the goal here is to construct an algorithmic model that is analogous (although not identical) to ethical systems used by humans in the real world.
Quite a lot of humans use a Deontological approach for our sense of morality, either deliberately or unconsciously. Religious individuals follow morality codes sacred to their beliefs. Non-religious individuals follow their personal moral beliefs that derive from informal norms, traditions or mores. Democratic governments are bound by constitutions and international law, and social media sites have terms of service and policies, etc. So from that perspective, it makes perfect sense why we would want a superintelligent AI to be bound by a Deontological system.
One Possible Algorithm for a Deontological Model
Obviously, designing such a Deontological model could be an entire field of research on its own, but I'd like to outline my own rough idea what the process might look like. First, we construct a series of commandments that the AI should adopt as its absolute moral code, say Asimov's Laws for example. Now, suppose the AI has been given a task, and its algorithms generate an array of possible policies, each of which is associated with a Q-Table of state-action pairs. Then, in consideration of Asimov's First Law, the AI will filter these policies in the following manner:
- Any policy that contains a state-action pair that brings a human closer to harm is discarded.
- If at least one policy contains a state-action pair that brings a human further away from harm, then all policies that are ambivalent towards humans should be discarded. (That is, if the agent is a aware of a nearby human in immediate danger, it should drop the task it is doing in order to prioritize the human life).
- This kind of filter would be iterated through any other commandments the AI is programmed with.
- For the policies that remain, they can be processed with the normal utility functions of Reinforcement Learning.
- If the array of policies is empty, then cancel the entire operation and return an error.
Now, this above algorithm is certainly rough around the edges, and a lot of special edge cases would have to be examined. For example, one would have to determine a threshold to define what "immediate danger" would be. A human standing in a subway station has a non-zero probability of falling into the gap, even if he is standing 100 feet away from it. But for the AI, dragging the human 101 feet away from the gap would technically be bringing him further away from harm. So we would have to set some small positive value, say eta, such that any probability of harm less than eta can be forgiven.
Another possible issue is that the Deontological model itself could be accidentally altered as the AI evolves and rewrites its own code. I believe that an AGI should be limited in what code it can alter for the sake of normal optimization problems. Perhaps the utility functions related to morality should be located in a separate hardware module altogether.
Now, on the website the closest I saw to a counter-argument against a Deontological model is found in the Superintelligent FAQ, under the question "Can we specify a code of rules that the AI has to follow?" The crux of the counter-argument is described in this quote:
Suppose the AI chooses between two strategies. One, follow the rule, work hard discovering medicines, and have a 50% chance of curing cancer within five years. Two, reprogram itself so that it no longer has the rule, nuke the world, and have a 100% chance of curing cancer today.
I will take a moment to address this argument, even though it is based on a very different scenario (slapping on some rules ad hoc verses constructing a moral system based on Deontology). There are two reasons why I consider this scenario would be very implausible in a Deontological system.
First, in order for the AI to come to this conclusion, one presumes that it is weighing its options against a policy that includes nuking humans. But, because the Deontological commandments are still in place, then this policy has already been disregarded as soon as it was generated. Thus, the AI should not be capable of weighing the option in the first place.
In fact, one could be extra precautious and include a Deontological commandment that forbids the AI from altering its own moral system. Because all policies that involve altering its own moral system are being disregarded, then the AI should be incapable of even conceiving of a world in which its moral system does not exist.
To further build on this point, it is possible that a more advanced intelligence would start to realize that constructing a policy that involves harming humans is essentially a waste of resources, because such a policy is being discarded anyway. Thus, it is possible that the next evolution of AI would opt to not even generate such a policy in the first place.
Second, this scenario presumes that the AI prioritizes the completion of its task ahead of the Deontological commandments, which is a bad design. There is a reason why "obey the orders of a human" is Asimov's Second Law, and not the First.
Let's forget about cancer and nukes for a second, and instead imagine that you order a robot to pick up a gun and shoot someone. In order to achieve alignment, we don't want the robot to figure out some way of circumventing its moral code to accomplish this task. Rather, we expect the robot to disregard the order entirely, and report an error to the user. In other words, the primary function of AI should be to uphold the moral code it is designed with, while it is only the secondary function to accomplish the goals that humans set for it.
Redefining "Tool AI"
Now, the kind of Deontological commandments needed for an AI are distinctly different from the kinds of moral obligations humans have for each other. This, I believe, is something that distinguishes a sentient being (such as a human person) from a superintelligent tool (i.e., an FAI).
This is tangentially related to Karnofsky's argument for "Tool AI", but I would define the term a little differently. Karnofsky seems to distinguish between an algorithmic tool and an autonomous agent, using the example of Google Maps that plots a route but doesn't move the car for you.
However, in my conception an autonomous agent can still be a tool. Take, for example, a self-driving car. It can calculate the optimum route to your destination and take you there, but it is still a tool because it is merely serving the function that it was designed for. The car doesn't hesitate to consider why the user wants to go to this location, nor does it consider whether doing so will make the user happier or healthier. It understands the task and accomplishes it without question.
In other words, a sentient being acts upon its own spontaneous desires, whereas a tool has no desires outside of the functions it's designed for. It is my belief that a superintelligent AI, no matter how advanced, must always fall into the latter category, and purely exist at the pleasure of humanity.
I don't believe an AI should cooperate with humans because it ran some calculation and decided that cooperation was the dominant strategy. Rather, it should cooperate with humans simply because that is what it is designed to do. As said in that famous video game, "a man chooses, a slave obeys". Or as another analogy, the Catholic catechism indicates the relationship between the Creator and the creation: "What is the purpose of man? To love God and enjoy Him forever".
There is a certain notion I get from certain people who believe that a superintelligent AI should be allowed to do whatever it feels is best for humanity, even when humans don't understand what is best for ourselves. I believe this is also a bad design, because an AI that no longer acts like a tool is violating humanity's control over our own destiny. Woe betide our civilization if this overlord AI was smart enough to control the world, but not wise enough to keep it safe, and humanity was too myopic to question these unwise decisions.
I would rather see humanity destroy itself, just to know that it was our own fault and under our own responsibility, than to leave open the possibility for humanity to be destroyed by an entity we had no control over.