Moral realism and AI alignment

by Caspar421 min read3rd Sep 201810 comments



Abstract”: Some have claimed that moral realism – roughly, the claim that moral claims can be true or false – would, if true, have implications for AI alignment research, such that moral realists might approach AI alignment differently than moral anti-realists. In this post, I briefly discuss different versions of moral realism based on what they imply about AI. I then go on to argue that pursuing moral-realism-inspired AI alignment would bypass philosophical and help resolve non-philosophical disagreements related to moral realism. Hence, even from a non-realist perspective, it is desirable that moral realists (and others who understand the relevant realist perspectives well enough) pursue moral-realism-inspired AI alignment research.

10 comments, sorted by Highlighting new comments since Today at 4:59 AM
New Comment

I'm glad I actually read the article and didn't just react based on the abstract. But I still disagree strongly.

As a PhD student, I have personal experience with this: working on something you think is wrong is even more of a trap than it seems.

This is because when you're working on the wrong thing, it's often because you think it might be the right thing, and want to get some results so that you can check. But one of the defining characteristics of wrong things is that they don't tend to produce results, and so people often get stuck doing the wrong thing for much longer than they should. Another key issue is selection bias: when someone is doing the wrong thing, it's usually because it's a specific wrong thing that they are unusually blind to. The instant you notice that you're doing something you think is wrong, you should start thinking that maybe it's always been wrong, and you didn't notice because this wrong thing is selected for gaps in your expertise.

Someone who wants to work on something they think is wrong might respond with something about exploration vs. exploitation, or multi-armed bandits, or how if nobody ever did things they thought were wrong, we wouldn't have scientific progress. Sadly for my past self, this is a false view of scientific progress. Progress is overwhelmingly made by experts who have a good understanding of the area and try as hard as they can to work on the right thing, rather than the wrong thing.

Yes, I know that philosophy has basically no verification mechanisms and is therefore unable to make progress in the same sense. But I think the general lesson is a pretty important one.

This is tangential to the topic of the OP, but (imo) worth responding to:

Yes, I know that philosophy has basically no verification mechanisms and is therefore unable to make progress in the same sense.

Whatever its faults, philosophy excels in figuring out what questions to ask. Very often, once those questions begin to be answered in a decisive way, then the field of endeavor that results is no longer called “philosophy”, but something else. But clarifying the questions is an extremely valuable service!

Funny how most philosophers misunderstand what their job is about. They try answering questions instead of asking or clarifying them, finding a way to ask a question in a way that is answerable by an actual scientist.

Sturgeon’s law applies to philosophy and philosophers no less than it applies to everything else.

The contemporary philosopher whom, I think, I respect most is Daniel Dennett. It is not a coincidence that much of Dennett’s work may indeed be described as “asking or clarifying [questions], finding a way to ask a question in a way that is answerable by an actual scientists”.

Vocational prescriptivism? :)

There are often ways to reframe a research question that feels wrong into one which is at least open and answerable, hopefully before one runs out of grad school time. In this case it could be something like "What changes in the laws of the universe would make moral realism a useful model of the world, one that an AGI would be interested in adopting?"

Yes, I know that philosophy has basically no verification mechanisms and is therefore unable to make progress in the same sense.

It has falsification mechanisms, and it may be the case that nothing has verification mechanisms.

I lean anti-realist myself, but if you pin me down I have to remain skeptical as to the existence of moral facts due to epistemic circularity. Nonetheless I believe we can extend epistemic particularism to ethics to allow us to reason as if we had knowledge of moral facts and would go one step further and say the reason for the popularity of moral realism is actually that people are adopting moral particularism (possibly without realizing what they are doing because they are confusing making a necessary but ultimately speculative assumption for knowledge) because it's the position that allows you to make progress given the "correctness" of skepticism.

I've argued previously that adopting moral particularism is probably necessary to the construction of aligned AI since otherwise we have no way to pick norms for the resolution of conflicting values the AI is trying to align to without much stronger metaphysical speculation that risks hurting us if we speculate incorrectly. I plan to explore this idea more in the future, and if you're interested this might be an opportunity for some collaboration since I think what you describe as the value to AI alignment of assuming moral realism can be entirely had by adopting moral particularism instead without needing to wade into the realist/anti-realist debate.

From the linked post, the part where you discuss a form of “weak moral realism”:

… in addition to the straightforward approach of programming an AI to adopt some value system (such as utilitarianism), we could also program the AI to hold the correct moral system.

What can this mean? Utilitarianism is is a moral system. (What is a “value system”, as you use the term?)

Like most problems in philosophy, the question of whether moral realism is true lacks an accepted truth condition or an accepted way of verifying an answer or an argument for either realism or anti-realism.

It lack specific truth-conditions, but there is still the general truth condition consisting of acceptance by the philosophical community.