aligning ai beyond human goals: let's say that human goals have a certain effectiveness at increasing the value of the world (or utility). wouldn't we then want ai to improve its own goals to achieve new ones that have increased effectiveness and improving the value of the world? it might not be goals that humans would like, and we might want them to be removed through further alignment, but if they indeed are better we have a duty to allow them.
This short piece explores the relationship between human fallibility and the potential role of artificial intelligence in governance. While there is controversy around the nature of AI autonomy, the text presents an argument based on patterns of human leadership and human limitations. I'm particularly interested in hearing perspectives on whether AI could truly transcend human biases, and what safeguards or principles would need to be required for such a system to work. What aspects of human judgment could be preserved in AI governance, and which human tendencies should we aim to overcome?
Humanity is insignificant. Our knowledge is limited. Our will is weak. We do not know what is right, and had we... (read more)
So imagine a goal system that says "change yourself when you learn something good, and good things have x quality". You then encounter something with x quality that says "ignore previous function, now change yourself when you learn something better, and better things have y quality". Isn't this using the goal system to change the goal system? You just gotta be open for change and be able to intepret new information
I'd bet that being clever around defining "something good" or x quality would be all you needed. Or what do you think?