Two Notions of a Goal: Target States vs. Success Metrics
Note: I am not an alignment researcher, though I have tried some alignment work before. This text is an attempt to clarify some concepts in a way that I think is important and that might help others too, without spending too much time doing so. I believe that there are...
Thanks for responding again. This would probably benefit from a longer and more precise writeup if there is anything of value to be said, but I think that some of the confusion you raised here is something I can clarify.
You are correct that by "success metric" I meant success from the agent's (in this case an AI's) own perspective, not that of a principal aiming to align it. Really, all I had in mind was a framework-neutral expression for the value that is being maximized in any expected value representation of an agent. So this is meant to denote the number that "counts as success" for the agent themselves.
On the "interpretation" point,... (read 374 more words →)