LESSWRONG
LW

johnjdziak
2010
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
The self-unalignment problem
johnjdziak2y30

Thank you for this!   I have sometimes wondered whether or not it's possible for even a superhuman AI to meaningfully answer a question as potentially undetermined as "What should we want you to do?"   Do you think that it would be easier to explain things that we're sure we don't want (something like a heftier version of Asimov's Laws)?  Even then it would be hard (both sides of a  war invariably claim their side is justified; and maybe you can't forbid harming people's mental health unless you can define mental health), but maybe maybe sufficient to avoid doomsday until we thought of something better?

Reply
No posts to display.