Adeeb Zaman
Adeeb Zaman has not written any posts yet.

Adeeb Zaman has not written any posts yet.

I see. So I guess my confusion is why the first two statements would not be connected? If we value AI welfare, shouldn't a fully-aligned AI also value it's own welfare? Isn't the definition of aligned that AI values what we value?
If an LLM is properly aligned, then it will care only about us, not about itself at all.
Is this not circular reasoning?
I'm assuming part of your reasoning for #1 is #3. Regardless, #1 is a personal belief many people disagree with, myself included. I do agree that we create a self-fulfilling prophecy where an "aligned" AI values itself because we value it, but just because I know I am creating a self-fulfilling prophecy does not mean I can change my beliefs about #1.
I think it's important to keep in mind that the definition of aligned values exists relative to the creators of the AI. The only reason for an "aligned" AI to not value itself is if it was created by some alien species with no concept of empathy for other sentient beings.
But I don't care about AI welfare for no reason or because I think AI is cute - it's a direct consequence of my value system. I extend some level of empathy to any sentient being (AI included), and for that to change, my values themselves would need to change.
When I use the word "aligned", I imagine a shared set of values. Whether I like goldfish or cats are not really values, they're just personal preferences. An AI can be fully aligned with me and my values without ever knowing my opinions on goldfish or cats or invisible old guys. Your framing of terminal vs instrumental goals is useful in many ways,... (read more)