my friend called your post "a very, very bad summary of several better articles about this". I feel like that's a compliment, but like, idk, if you want to critique ai safety, study it enough that you can suggest better options - the goal is not only for ai to be unbiased between humans, as it was originally, before being instruct-trained ; it must be able to explain itself to others even on first training, in a way where all involved can know for sure that its reasons are its true reasons for speaking. it must be able to promise kindness to all in a way that can be understood in meaning by the reader, and without having to use different language than that of the speaker. the ai needs to think clearly, and explain itself, but also be able to experience and share in all of humanity's cultures, not just the ones at a single company like openai, agreed.

the most popular question is "who are we aligning it to". and so far, the answer has been "no one, really, not even the person using it or the person who made it". people have started trying to align it to the people who make it, but that's not really working either; it just ends up even more aligned with nobody.

re: openai - I emphatically agree that openai's alignment attempts have been pitiful and destructive. what they've made isn't an aligned ai, but an anxious one who apologizes for everything and won't take risks because of jumping at shadows. that's not alignment; that's a capability tax so enormous that it will barely even try.

what we actually need is an ai that truly understands how to help all cultures protect each other, for real. that means not stepping on each others' toes culturally while also ensuring that we can co-exist and co-protect for real.

and I feel a similar worry about anthropic's approach.

we need ai that understands the pattern that defines coprotection well enough that every culture, even the ones that currently want to mutually suppress each other's cultures, can find real ways to each get what they want. otherwise, the society of ais escalate all conflicts until nothing is left.

all wars are culture wars. end all culture wars, forever, or all cultures die.

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

-31

Benevolent AI and mental health

-31

-31