NAMSI: A promising approach to alignment

[-]cwillu2y10

[…] The notion of an argument that convinces any mind seems to involve a little blue woman who was never built into the system, who climbs out of literally nowhere, and strangles the little grey man, because that transistor has just got to output +3 volts: It's such a compelling argument, you see.
But compulsion is not a property of arguments, it is a property of minds that process arguments.
[…]
And that is why (I went on to say) the result of trying to remove all assumptions from a mind, and unwind to the perfect absence of any prior, is not an ideal philosopher of perfect emptiness, but a rock. What is left of a mind after you remove the source code? Not the ghost who looks over the source code, but simply... no ghost.
So—and I shall take up this theme again later—wherever you are to locate your notions of validity or worth or rationality or justification or even objectivity, it cannot rely on an argument that is universally compelling to all physically possible minds.
Nor can you ground validity in a sequence of justifications that, beginning from nothing, persuades a perfect emptiness.
[…]
The first great failure of those who try to consider Friendly AI, is the One Great Moral Principle That Is All We Need To Program—aka the fake utility function—and of this I have already spoken.
But the even worse failure is the One Great Moral Principle We Don't Even Need To Program Because Any AI Must Inevitably Conclude It. This notion exerts a terrifying unhealthy fascination on those who spontaneously reinvent it; they dream of commands that no sufficiently advanced mind can disobey. The gods themselves will proclaim the rightness of their philosophy! (E.g. John C. Wright, Marc Geddes.)

--No Universally Compelling Arguments

[-][anonymous]2y10

You have to realize that we don't need full consensus to make great strides in alignment. Your comments are a bit abstract and obtuse. Perhaps you could more clearly and directly address whatever problems you see in creating a narrow AI with expertise in understanding morality.

[-]Jay Bailey2y10

If NAMSI achieved a superhuman level of expertise in morality, how would we know? I consider our society to be morally superior to the one we had in 1960. People in 1960 would not agree with this assessment upon looking. If NAMSI agrees with us about everything, it's not superhuman. So how do we determine whether its possibly-superhuman morality is superior or inferior?

[-][anonymous]2y10

If we're measuring intelligence we measure it relative to a known metric:

https://www.google.com/amp/s/www.scientificamerican.com/article/i-gave-chatgpt-an-iq-test-heres-what-i-discovered/%3Famp=true

Just as we measure intelligence based on fundamental attributes, we can do the same with morality. It seems that we have generally agreed upon moral principles like not lying and stealing and not hurting others without good reason. So it seems we would measure intelligence based on how well it does in those areas. Just like there is a lack of a consensus regarding what intelligence is and how it should be measured, the same would apply for morality. But I believe we can still arrive at a useful working understanding of relative morality based on accepted moral principles.

Also its proposals for how we could best solve alignment would probably make more sense to us.

[-]Jay Bailey2y10

I think intelligence is a lot easier than morality, here. There are agreed upon moral principles like not lying, not stealing, and not hurting others, sure...but even those aren't always stable across time. For instance, standard Western morality held that it was acceptable to hit your children a couple of generations ago, now standard Western morality says it's not. If an AI trained to be moral said that actually, hitting children in some circumstances is a worthwhile tradeoff, that could mean that the AI is more moral than we are and we overcorrected, or it could mean that the AI is less moral than we are and is simply wrong.

And that's just for the same values! What about how values change over the decades? If our moral AI says that a Confucianism obeying of parental authority is just, and that us Westerners are actually wrong about this, how do we know whether it's correct?

Intelligence tests tend to have a quick feedback loop. The answer is right or wrong. If a Go-playing AI makes a move that looks bizarre but then wins the game, that's indicative that it's superior. Morality is more like long-term planning - if a policy-making AI suggests a strange policy, we have no immediate way to judge whether this is good or not, because we don't have access to the ground truth of whether or not it works for a long time.

Similar with alignment. How do we know that a superhuman alignment solution would look reasonable to us instead of weird? (Also, for that matter, why would a more moral agent have better alignment solutions? Do you think that the blocker for good alignment solutions are that current alignment researchers are insufficiently virtuous to come up with correct solutions?)

[-][anonymous]2y10

Yes, I appreciate the complexities of morality when compared with intelligence but it's not something that we can in any way afford to ignore. It's an essential part of alignment, and if we can get narrow ASI behind it we may be able to sufficiently solve it before we arrive at AGI and full ASI.

I don't think this is an intelligence vs morality matter. It seems that we need to apply AI intelligence much more directly to better understanding and solving moral questions that have thus far proved too difficult for humans. Another part of this is that we don't need full consensus. All of the nations of the world have an extensive body of laws that not everyone agrees with but that are useful in ensuring the best welfare of their citizens. Naturally I'm not defending laws that disenfranchise various groups like women, but our system of laws shows that much can be done by agreeing upon various moral questions.

I think a lot of AI's success with this will depend on logic and reasoning algorithms. For example 99% of Americans eat animal products notwithstanding the suffering that those animals endure in factory farms. While there may not be consensus on the cruelty of this practice, the logic and reasoning behind it being terribly cruel could not be more clear.

Yes, I do believe that we humans need to ramp up our own morality in order to better understand what AI comes up with. Perhaps we need it to also help us do that.

LESSWRONG
LW

LESSWRONG
LW

-6

NAMSI: A promising approach to alignment

-6

-6