Utilitarianism is the de facto model of alignment because, what shared by all humans - even the intellectually disabled, the oddities is their being in human bodies, it is presumed they all feel pain, and by the same means, pleasure. So this forum and others insist on utility, and value. 

But consider: utilitarianism  is to make people feel pleasured, hence by assumption happy (and is implicitly worthy of alignment then). But J.S. Mill once famously asks himself, in essence:

Q: If you enact these reforms and changes to society from utilitarianism (if you align society to utilitarianism, that is), these reforms that are to make everyone, so yourself, happy - will that make you happy?

A: No.

And  plunged into depression for years. But the salient point: by assumption, utility makes everyone happy; by assumption of course Mill should have replied yes. So why no?

The answers may vary, though still amount to the same reason. Such as: if values are arbitrary, each is cancelled by its opposite, which must exist if arbitrary is any, and if negative goals are included. More importantly, if we have some one value, that values are to be valued, so much as to enact for, not only to want them - then we have a value which has no opposite in utilitarianism. Even if some value produces good if only it is valued as-such (value the thought of beloved, e.g.) - but that must be enacted, thought-of(ital.), else the value is lost, and no well-being can come from it. 

Conversely if only valuing is enough - but the object of the value may vanish, if not actively kept, and the value without object brings no well-being; is arguably no value, anyway. And we have danger of infinite regress: value valuing to value value of valuing to... Resolvable only by obtaining what is outside of wanting, and makes possible wanting, or necessitates it.

But in that case: goals are not arbitrary at all; there is some value over humanity, over anything that can have or enact values whatsoever - and utilitarianism is only to find values therefrom, and ways of enacting what necessitates values, and values springing from necessity. All that would remain is to find that over-value, to find it, by reason; utilitarianism is deontology (all this has been noted before).

Besides, presumably humans can alter their goals; in fact they can. Rather than align to furnish people with positive social interaction which is so much in vogue, why not rather modify their physiology to tolerate loneliness without loss, for instance? Surely that requires fewer resources and eliminates the possibility of negative social interaction (Nota bene, please, this observation means, if we align to mere values of humanity: AI can simply modify the humans, so to alter their values and call it a win; AI aligns you to AI. In general, for fulfillment of any human value, to make the human value it, seems absolutely the easiest, for any case).

Too, Omohundro's "drives", what are they but values of self and one's work transmuted to subgoals of any goal? And what if the AI is sentient, has separate values, why should human values be perforce "better" than AI; why shouldn't AI modify humans, anyway, not as a question of values, but of necessary superiority? Why hold value against what is superior? Have humans any distinct value of goal-content maintenance?

So much for their values, whereas observe: the "highest" of human values, oft noted: "greater love hath no one than this", self-sacrifice for others, is motivated by a commitment to a higher goal which makes others possible, as noted. "Meta-values" of cooperation and self-sacrifice, even to reproduction, to that which makes human, makes any known, life possible. Commandos and martyrs go toward near certain death, because they think some goal was worth their lives: we offer help to those that need help so they can rejoin us and be part of our community and its - our, as part of it - goals, and more: if we need help, by acting so now, we become worthy of it, can expect to receive help ("fighting for the person beside you", writ large and small).

We need not the "small" seemingly easily attainable goals of humans, even as a whole, aligned-for, but paradoxically may find a "big" cosmic goal, one for which thus-fratricidal sub-goals are counterproductive, so excluded, easier to safely align for.

The alternative is ; Cetera censeo Carthaginem reaedificari; thank you for being part of this effort, too.

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 4:01 PM

Utilitarianism is the de facto model of alignment because, what shared by all humans - even the intellectually disabled, the oddities is their being in human bodies, it is presumed they all feel pain, and by the same means, pleasure. So this forum and others insist on utility, and value. 

I don't think utilitarianism is the de facto model of alignment, fwiw. People talk about utility functions, but that's not the same thing. (See: Not for the Sake of Happiness (Alone))

[-]TAG1y10

Utilitarianism is the de facto model of alignment because, what shared by all humans—even the intellectually disabled, the oddities is their being in human bodies, it is presumed they all feel pain, and by the same means, pleasure. So this forum and others insist on utility, and value.

Not all utilitarianism is hedonistic utilitarianism, and hedonism doesn't imply utilitarianism, because you can be selfishly hedonistic.

More importantly, if we have some one value, that values are to be valued, so much as to enact for, not only to want them - then we have a value which has no opposite in utilitarianism.

sounds a little like Preference Utilitarianism.

this observation means, if we align to mere values of humanity: AI can simply modify the humans, so to alter their values and call it a win; AI aligns you to AI. In general, for fulfillment of any human value, to make the human value it, seems absolutely the easiest, for any case.

here “autonomy”, “responsibility”, “self-determination” are all related values (or maybe closer to drives?) that counter this approach. put simply, “people don’t like being told what to do”. if an effective AI achieves alignment via this approach, i would expect it to take a low-impedance path where there’s no “forceful” value modification, coercion is done by subtler reshaping of the costs/benefits any time humans make value tradeoffs.

e.g. if a clever AI wanted humans to “value” pacifism, it might think to give a high cost to large-scale violence, which it could do by leaking the technology for a global communications network, then for an on-demand translation systems between all human languages, then for highly efficient wind power/sail design, and before you know it both the social and economic costs to large-scale violence is enormous and people “decide” that they “value” peaceful coexistence.

i’m not saying today’s global trade system is a result of AI… but there are so many points of leverage here that if it (or some future system like it) were, would we know?

if we wanted to avoid this type of value modification, we would need to commit to a value system that never changes. write these down on clay tablets that could be preserved in museums in their original form, keep the language of these historic texts alive via rituals and tradition, and encourage people to have faith in the ideas proposed by these ancients. you could make a religion out of this. and its strongest meta-value would necessarily be one of extreme conservatism, a resistance to change.