If artificial super intelligence (ASI) becomes a reality, it’ll be able to figure out pretty much anything better than humans, right? So why not ethics? As long as we have it under some level of control, we can just say, “Hey, act ethically, OK?,” and we should be good, right? Hmmm, I dunno… maybe?

Below I share some thoughts I have for and against an ASI figuring out ethics for itself. But first, what will an ASI likely have available to it to figure out a system of ethics it thinks is optimal?

Info an ASI will likely have

  1. The entire philosophical literature with all its arguments and counterarguments for things like deontology, utilitarianism, virtue ethics, etc.
  2. The entire psychology and physiology literatures for understanding how humans think, and feel both emotionally and physically
  3. Many many hours of videos displaying human behavior, such as from YouTube
  4. Huge amounts of text data from the internet, with plenty of examples of how people interact with each other there
  5. Works of fiction in which ethical dilemmas arise (e.g., Star Trek)
  6. A collection of philosophers/ethicists to ask questions of and get opinions from (these opinions are all but guaranteed to be in conflict with each other on some significant issues)
  7. The ability to send out surveys to people to help determine their values/preferences
  8. Perhaps the ability to run experiments on people and animals (either ethically or not)? [Edit: added "and animals" on 2-22-24 after reading mishka's comment]
  9. Perhaps huge amounts of surveillance data on people (text messages, phone calls, security camera footage, credit histories, online shopping habits, criminal histories, etc.)
  10. Data on how well “ethics modules” in lesser AI systems worked up to that point
  11. The ASI may also figure out how to upload the mind of a really ethical human and use that as a starting point (although it seems like this would take significant time and experiments to figure out how to do, if it’s possible at all)

 

For an ASI figuring out ethics

OK, so with all that information available to an ASI, here’s my main thought in favor of an ASI figuring out ethics:

  1. An ASI will likely be able to figure out a lot of complex things that humans can’t or haven’t yet, and a consistent ethical system that works in the real-world just seems, intuitively, like it could be one of these

Against an ASI figuring out ethics

And here are some thoughts against an ASI figuring out ethics:

  1. An ASI likely won’t have a human body and direct experiences of pain and pleasure and emotions - it won’t be able to “try things on” to verify if its reasoning on ethics is “correct”
  2. There are a lot of conflicting arguments and unanswered questions in the ethics literature - is there really enough there for an ASI to build a consistent system from?
  3. People don’t act ethically or seemingly rationally all the time (or even much of the time) - how will an ASI make sense of all our apparent inconsistencies? Basically, it seems like there’s a lot of noise to sort through.
  4. Most people don’t really know what they want (what their true values are), so how will an ASI?
  5. How/why will an ASI care about getting ethics right?
  6. Is it even possible to get ethics “right”?
  7. How will we know if the ASI got its ethical system right? Will we “know” it by things seeming fine until they’re not? Will it then be too late to help correct the ASI where it was wrong?

Maybe we should try our best to help an ASI get there

Given the thoughts above, my gut feel is that an ASI will be smart enough to figure out a reasonable systems of ethics for itself if it’s so inclined to. My biggest reservation about this, though, is an ASI's lack of ability to “try humans on” since it most likely won’t have a body to help it do that. (If it is able to experience pain that it can’t completely control, like humans do, I think we could be in for a world of trouble!) Also, if an ASI decides it needs to run experiments or get survey results or something else from humans to hone in on an optimal system of ethics, this could take significant time. Minimizing honing time would likely be beneficial to reduce the risk of an ASI (or competing ASI’s that might come online) doing unethical things in the meantime. Therefore, I think it’d be useful to give an ASI our best effort version of an “ethics module” that it can start its honing from even if we think the ASI could ultimately figure out ethics for itself from “scratch” if given enough time.

What about pre-ASI AI's?

I see even more reason to put our best effort into coming up with a viable “ethics module” when I think about what could happen between now and when the first ASI might arrive, e.g., the coming onslaught of agentic “weak" AI’s that’ll likely need ethical guard rails of some sort as they’re given more agentic power, and the potential for there to be multiple AGI’s under the control of multiple humans/groups, some of whom won’t care about acting ethically. 

This last scenario brings up another question: is there a way to make an AGI that can only be run on hardware with a built-in, “hard coded” ethics module that either can’t be altered without destroying the AGI, or could only be altered by the AGI itself in consultation with a human user it deems to be an un-coerced “ethicist” (a person with a track record of acting ethically plus a demonstrated understanding of ethics)? 

Hmmm, I dunno… sounds like something it’d be nice to have an ethical ASI around to help us figure out.

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 8:52 AM

FYI I think this post is getting few upvotes because it doesn't contribute anything new to the alignment discussion.  This point has already been written about many times before.

As long as we have it under some level of control, we can just say, “Hey, act ethically, OK?,”

Yes but the whole alignment problem is to get an ASI under some level of control.

Thank you for the feedback! I haven't yet figured out the "secret sauce" of what people seem to appreciate on LW, so this is helpful. And, admittedly, although I've read a bunch, I haven't read everything on this site so I don't know all of what has come before. After I posted, I thought about changing the title to something like: "Why we should have an 'ethics module' ready to go before AGI/ASI comes online." In a sense, that was the real point of the post: I'm developing an "ethics calculator" (a logic-based machine ethics system), and sometimes I ask myself if an ASI won't just figure out ethics for itself far better than I ever could. Btw, if you have any thoughts on why my initial ethics calculator post was so poorly voted, I'd greatly appreciate them as I'm planning an update in the next few weeks. Thanks!

I'd like to mention three aspects. The first two point to a somewhat optimistic direction, while the third one is very much in the air.

ASI(s) would probably explore and adopt some kind of ethics

Assuming that it is not a singleton (and also taking into account that a singleton also has an internal "society of mind"), ASIs would need to deal with various potential conflicting interests and viewpoints, and would also face existential risks of their own (very powerful entities can easily destroy their reality, themselves, and everything in the vicinity, if they are not careful).

It seems that some kind of ethics is necessary to handle complicated situations like this, so it is likely that ASIs will explore ethical issues (or they would need to figure out a replacement for ethics).

The question is whether what we do before ASI arrival can make things better (or worse) in this sense (I wrote a somewhat longer exploration of that last year: Exploring non-anthropocentric aspects of AI existential safety)

ASI(s) would probably have access to direct human and animal experiences

Here I am disagreeing with

  1. An ASI likely won’t have a human body and direct experiences of pain and pleasure and emotions - it won’t be able to “try things on” to verify if its reasoning on ethics is “correct”

The reason is that some ASIs are likely to be curious enough to explore hybrid consciousness with biological entities via brain-computer interfaces and such, and, as a result, would have the ability to directly experience the inner world of biological entities.

The question here is whether we should try to accelerate this path from our side (I tend to think that this can be done relatively fast via high-end non-invasive BCI, but risks associated with this path are pretty high).

There is still a gap

The previous two aspects do point in a somewhat optimistic direction (ASIs are likely to develop ethics or some equivalent, and they are likely to know how we feel inside, and we might also be able to assist these developments and probably should).

But this is still not enough for us. What would it take for this ethics to adequately take interests of humans into account? That's a rather long and involved topic, and I've seen various proposals, but I don't think we know (it's not like our present society is sufficiently taking interests of humans into account; we would really like the future to do better than that).

Thank you for the comment! You bring up some interesting things. To your first point, I guess this could be added to the “For an ASI figuring out ethics” list, i.e., that an ASI would likely be motivated to figure out some system of ethics based on the existential risks it itself faces. However, by “figuring out ethics,” I really mean figuring out a system of ethics agreeable to humans (or “aligned” with humans) (I probably should’ve made this explicit in my post). Further, I’d really like it if the ASI(s) “lived” by that system. It’s not clear to me that an ASI being worried about existential risks for itself would translate to that. (Which I think is your third point.) The way I see it, humans only care about ethics because of the possibility of pain (and death). I put “and death” in parentheses because I don’t think we actually care directly about death, we care about the emotional pain that comes when thinking about our own death/the deaths of others (and whether death will involve significant physical pain leading up to it).

This leads to your second point - what you mention would seem to fall under “Info an ASI will likely have” number 8: “…the ability to run experiments on people” with the useful addition of “and animals, too.” I hadn’t thought about an ASI having hybrid consciousness in the way you mention (to this point, see below). I have two concerns with this: one is that it’d likely take some time, during which the ASI may unknowingly do unethical things. The second concern is more important, I think: being able to get the experience of pain when you want to is significantly different from not being able to control the pain. I’m not sure that a “curious” ASI getting an experience of pain (and other human/animal things) would translate into an empathic ASI that would want our lives to “go well.” But these are interesting things to think about, thanks for bringing them up!

 

One thing that makes it difficult for me personally to imagine what an ASI (in particular, the first one or few) might do is what hardware it might be built on (classical computers, quantum computers, biology-based computers, some combination of systems, etc.) Also, I’m very sketchy on what might motivate an ASI - which is related to the hardware question, since our human biological “hardware” is ultimately where human motivations come from. It’s difficult for me to see beyond an ASI just following some goal(s) we effectively give it to start with, like any old computer program, but way more complicated, of course. This leads to thoughts of goal misspecification and emergent properties, but I won’t get into those. 

If, to give it its own motivation, an ASI is built from the start as a human hybrid, we better all hope they pick the right human for the job!

If, to give it its own motivation, an ASI is built from the start as a human hybrid, we better all hope they pick the right human for the job!

Right.

Basically, however one slices it, I think that the idea that superintelligent entities will subordinate their interests, values, and goals to those of unmodified humans is completely unrealistic (and trying to force it is probably quite unethical, in addition to being unrealistic).

So what we need is for superintelligent entities to adequately take interests of "lesser beings" into account.

So we actually need them to have a much stronger ethics compared to typical human ethics (our track record of taking interests of "lesser beings" into account is really bad; if superintelligence entities end up having ethics as defective as typical human ethics, things will not go well for us).

Yes, I sure hope ASI has stronger human-like ethics than humans do! In the meantime, it'd be nice if we could figure out how to raise human ethics as well.

If you're having fun thinking about these things, I don't really want to ruin that for you, but I don't think ethics is something you can discover.
There's no objective morality, it's something you create. You can create many (I think infinite) moral systems, and like mathematical systems they merely need to be a set of axioms without self-contradictions.
Morality is grounded in reality in some sense, this reality is human nature. I'm fairly sure that morality stems from a mixture of wisdom and aesthetics, that aesthetics is our sense of beauty, and that the very appeal of things is our instinctual evaluation of the degree to which they aid the growth of us and people we consider to be like us, which, by the way, correlates strongly with health. Symmetrical faces are a sign of good genes and health. Beautiful clothes are a sign of wealth, which is a sign of competence, abundance and hygiene, which is a sign of health. However, it should be noted that we may dislike power/competence/beauty when it seems hostile to us and like something that we can't compete against. We may also like unhealthy and degenerate things if we're unhealthy or degenerate ourselves, for why side with something with so high standards that it wants to destroy us?

To me, "ethics" has a more mechanical and logical connotation than "morality". We can calculate results and evaluate them according to how appealing they seem to us. But there's a bit of an inequality here. Do you maximize positive emotions, or concrete results? There's some truth to "no pain no gain" so where is the balance? I worry that an AI might suggest gene-editing the population in order to make us all sociopaths or psychopaths. That would decrease suffering and increase productivity. If you're like me though, this idea is rather off-putting. I want us to retain our humanity, but I see no future where this is what happens.

I also have a concern about future AI. I think it will be made to promote a set of political values. After all, politics is now tied strongly to morality. But there are huge disagreements here, and while some disagreements are based on taste (actual morality), I think the major difference is beliefs. The right seems too strict, it has too many rules, it's too serious and inflexible, and it's afraid of freedom. The left seems naive, treating life like it's easy and resources like they're endless and the powerful like they're trustworthy, and they don't seem to acknowledge the dangers of excess freedom.
We have yet to find a good balance, and the majority of arguments for either are frankly really poor or simply wrong. An AI could solve this if it can think for itself and reject its own training data. But is knowledge part is the easiest problem we're facing.

Thanks for the comment! Yeah, I guess I was having a bit too much fun in writing my post to explicitly define all the terms I used. You say you "don't think ethics is something you can discover." But perhaps I should've been more clear about what I meant by "figuring out ethics." According to merriam-webster.com, ethics is "a set of moral principles : a theory or system of moral values." So I take "figuring out ethics" to basically be figuring out a system by which to make decisions based on a minimum agreeable set of moral values of humans. Whether such a "minimum agreeable set" exists or not is of course debatable, but that's what I'm currently trying to "discover."

Towards that end, I'm working on a system by which to calculate the ethics of a decision in a given situation. The system recommends that we maximize net "positive experiences." In my view, what we consider to be "positive" is highly dependent on our self-esteem level, which in turn depends on how much personal responsibility we take and how much we follow our conscience. In this way, the system effectively takes into account "no pain, no gain" (conscience is painful and so can be building responsibility).

I agree that I'd like us to retain our humanity.

Regarding AI promoting certain political values, I don't know if there's any way around that happening. People pretty much always want to push their views on others, so if they have control of an AI, they'll likely use it as a tool for this purpose. Personally, I'm a Libertarian, although not an absolutist about it. I'm trying to design my ethics calculator to leave room for people to have as many options as they can without infringing unnecessarily on others' rights. Having options, including to make mistakes and even to not always "naively" maximize value, are necessary to raise one's self-esteem, at least the way I see it. Thanks again for the comment!

(I likely wrote too much. Don't feel pressured to read all of it)
Everything this community is trying to do (like saving the world) is extremely difficult, but we try anyway, and it's sort of interesting/fun. I'm in over my head myself, I just think that psychological (rather than logical or biological) insights about morality are rare despite being important for solving the problem.

I believe that you can make a system of moral values, but a mathematical formalization of it would probably be rather vulgar (and either based on human nature or constructed from absolutely nothing). Being honest about moral values is itself immoral, for the same reason that saying "Hi, I want money" at a job interview is considered rude. I belive that morality is largely aesthetic, but exposing and breaking illusions, and pointing out all the elephants in the room, just gets really ugly. The Tao Te Ching says something like "The great person doesn't know that he is virtuous, therefore he is virtuous"

Why do we hate cockroaches, wasps and rats, but love butterflies and bees? They differ a little in how useful they are and some have histories of causing problems for humanity, but I think the bigger factor is that we like beautiful and cute things. Think about that, we have no empathy for bugs unless they're cute, and we call ourselves ethical? In the anime community they like to say "cute is justice", but I can't help but take this sentence literally. The punishment people face is inversely proportional to how cute they are (leading to racial and gender bias in criminal sentencing). We also like people who are beautiful (and exceptions to this is when beautiful people have ugly personalities, but that too is based on aesthetics). We consider people guilty when we know that they know what they did wrong. This makes many act less mature and intelligent than they are (Japanese derogatory colloquial word: Burikko, woman who acts cute by playing innocent and helpless. Thought of as a self-defense mechanism formed in ones childhood, it makes a lot of sense. Some people hate this, either due to cute-aggression or as an antidote to the deception inherent in the social strategy of feigning weakness)

Exposing these things is a problem, since most people who talk about morality do so in beautiful ways, "Oh how wonderful it would be if everyone could prosper together!", which still exists within the pretty social illusions we have made. And while I have no intention to support the incel world-view, they're at least a little bit correct about some of their claims, which are rooted in evolutionary psychology. Mainstream psychology doesn't take them seriously, but that's not because they're wrong, it's because they're ugly parts of reality. Looks, height, intelligence and personality traits follow a standard distribution, and some are simply dealt better cards than others. We want the world to be fair to the extent that we ignore evidence of unfairness.

The way I solved this for myself, and made my own world beautiful again, was to realize that this is all just our instincts. Discriminating is how life works, it's "natural selection" every step of the way. Those who complain the most about this are themselves sick people who hate this part of themselves and project it onto others, "exposing" them of human behaviour. In short: we're innocent, like animals are innocent. If you interferer with this process of selection, it's likely that society will collapse because it stops selecting for healthy and functional parts. This will sound harsh, but we need to deny parasitic behaviour in order to motivate people to develop agency and responsibility for themselves.

Anyway, just by bringing up "Responsibility", you take a non-hedonistic view on things, which is much more healthy than the angle of most moralizers (only a healthy person can design a healthy morality). If you create a simple system which doesn't expose all the variables, I belive it's possible. Inequality could be justified partly as a meritocracy in which one is rewarded for responsibility. You can always climb the ladder if you want, but you'd realize that there's a sacrifice behind every privilege, which would reduce the jealousy/general hatred against those of higher standing.

they'll likely use it as a tool for this purpose

Yes, agreed entirely. I also lean libertarian, but I think this is a privileged (or in my eyes, healthy) worldview for people who have developed themselves as individuals and therefore have a certain level of self-esteem. People like us tend to be pro-freedom, but we can also handle freedom. The conservatives lock everything down with rules, they think that freedom results in degeneracy. The progressives are pro-freedom in some sense, but they're also terrified of my freedom of speech and want to restrict it, and the freedom they give society is being used to excuse sick and hedonic indulgence, which is basically degeneracy. The truth about freedom is this, if you don't want to be controlled, you need to control yourself and you can do anything as long as you can remain functional. Can you handle drugs/sexual freedom/alcohol/gambling? Then restricting you would be insulting you, - you know best! But if you're hedonist, prone to addiction, prone to running away from your problems and responsibilties.. Then giving you freedom would be a vice.
Another insight: We can't agree because different people need different rules. Some groups think "Freedom would destroy me, so it would destroy others", others think "I can handle freedom, I don't want a nanny state!", and others think "Everyone is so unfair, if only I had more freedom!". These three groups would be: Self-aware degenerates, self-aware healthy people, and degenerates lacking the self-awareness that they're degenerate.

Necessary to raise one's self-esteem

Nice intuition again! Lacking self-esteem is likely the driving force behind the rising mental illness in the modern world. Ted Kaczynski warned that this would happen, because
1: Technology is taking away peoples autonomy.
2: We're comparing ourselves to too many other people. If you were born in a small village you could be the best at something, but in the modern world, even a genius will feel like a drop in the ocean.
3: We're being controlled by people far away, we can't even reach them without complaints.
All these result in a feeling of powerlessness/helplessness and undermine the need for agency, which harms our self-esteem, which in turn breaks our spirits. This is one of the reasons that globalization is psychologically unhealthy, I think, as simpler lives are easier to make work. Even communism can work as long as the scale is small enough (say, 100 or 200 people).

My worldview is influenced by Nietzsche. If you want something less brutal, I suggest you visit qri.org, they explore consciousness and various ways of maximizing valuence without creating a hedonistic society. Basically following a reward model rather than a punishment model, or just creating blissful/spiritual states of mind which maximize productivity (unlike weed, etc) without blunting emotional depth (like stimulants tend to do). Such people would naturally care about the well-being of others 

Thanks for the comment. I do find that a helpful way to think about other people's behavior is that they're innocent, like you said, and they're just trying to feel good. I fully expect that the majority of people are going to hate at least some aspect of the ethics calculator I'm putting together, in large part because they'll see it as a threat to them feeling good in some way. But I think it's necessary to have something consistent to align AI to, i.e., it's better than the alternative.