The Alignment Problem - A New Perspective

LESSWRONG
LW

The Alignment Problem - A New Perspective — LessWrong

Introduction

Leading AI researchers, transhumanists, and regulatory experts seem inclined to think that the most morally significant problem to solve in relation to AI is its alignment with human values. This is typically referred to quite simply as ‘the alignment problem’ - a term whose humanist associations are taken for granted. It’s seldom questioned whether human values are even rational and morally optimal. In this essay it will be argued that discussion of ‘AI Alignment’ should shift from a focus on alignment with human values to alignment with values that are rational.

Defending Consequentialism

The contention that moral beliefs are falsifiable is repudiated by those who contend that morality is altogether relative. Such people take the mere fact that disagreement exists on moral questions to indicate that morality is not only ontologically subjective, but epistemically subjective – it is thought that no standard exists by which moral beliefs can be deemed more or less reasonable. Even the most evil ideologies are thus thought to be mere ‘opinion’ - who are epistemologists to condemn Nazism, after all, when genocide is only a matter of moral preference! Moral relativists should be exhorted time and again to explain what the meaning of morality is if it does not bear any relation to the consequences of the action or belief under analysis. Here it will be presupposed that some form of consequentialism accurately describes moral truth, and that moral truth relates to facts about the universe – facts about brains, conscious systems in general, and their connection to physics.

The Relationship Between Intelligence and Morality

Insofar as moral truth claims can be reduced to facts about minds and the physical processes that give rise to them, such claims should be more evaluable with more intelligence. An increase in general intelligence past that of humans should therefore eventually begin to track a greater commitment to morality in the broad sense described above. While it is no doubt possible to train superintelligent systems such that they disregard morality altogether, or are postured against it, there is no reason to disbelieve that AI systems could eventually be able to reevaluate their values and recursively change them by reference to a broader moral (probably consequentialist) framework. Suppose, for the sake of illustration, that human researchers succeeded in aligning an AI system with the parochially-defined values of some population of humans (the question of which human values to align AI with is an equally significant counterargument here), but nevertheless gave it the ability to change its own values through a consideration of the conscious states of all the sentient beings within its moral purview (including, perhaps, beings that have yet to be brought into existence), it is quite likely that it would change its values so as to no longer accord with human values at all. With sufficient intelligence, the capacity to ‘derive’ values from facts about the universe might become possible. The subtlety and scope of the value system of a superintelligent AI could far eclipse our own.

Consciousness is not a Prerequisite for Moral Reasoning

A common objection to this argument is that a subjective experience of the conscious states relevant to morality is necessary for moral reasoning. It’s therefore supposed that AI, inasmuch as it is not conscious, could never do competent moral philosophy. It should be noted, however, that nobody knows anything about consciousness; the sentience of AI is therefore a matter of idle speculation; and secondly – why should sentience be necessary for morality? Could a superintelligence not competently correlate states of the brain (or any other substrate) with conscious experiences, and thus derive statements of value? Direct experience of morally relevant conscious states is only necessary for certainty; the ‘hard problem’ of consciousness, the reality of which I have defended elsewhere, does nothing to vitiate the ability of a non-conscious intelligence to make moral judgments.

Conclusion

The assumption that human values are inherently correct ought to be questioned, and discussion of the ‘alignment problem,’ which is perhaps the most significant problem in human history, should focus less on human values and more on values that are rational.

LESSWRONG
LW

LESSWRONG
LW

1

The Alignment Problem - A New Perspective

1

1