Thought provoking article! But likely it confused instrinsic "value" with "social reward" as in current definition of "Approval Reward". The intrisinc "value function" that agents operates under is likely much more complicated than "social approval" which definitely plays an important role for humans as we are evolved to be social creatures.
e.g., "So saving the money is not doing an unpleasant thing now for a benefit later. Rather, the pleasant feeling starts immediately, thanks to (usually) Approval Reward." Yes the pleasant feeling starts immediately, as judeged by the value function, like "things are making progress or moving towards a goal", not necessarily approval from others. Or like my aging parents started exercising, not just for my approval, but their new valuation function judge it worth doining, among which my approval is only a part.
Once one separates concepts intrinstic "value" from "social reward", one can see that the separated concepts play different roles in alignment.
Thought provoking article! But likely it confused instrinsic "value" with "social reward" as in current definition of "Approval Reward". The intrisinc "value function" that agents operates under is likely much more complicated than "social approval" which definitely plays an important role for humans as we are evolved to be social creatures.
e.g.,
"So saving the money is not doing an unpleasant thing now for a benefit later. Rather, the pleasant feeling starts immediately, thanks to (usually) Approval Reward."
Yes the pleasant feeling starts immediately, as judeged by the value function, like "things are making progress or moving towards a goal", not necessarily approval from others. Or like my aging parents started exercising, not just for my approval, but their new valuation function judge it worth doining, among which my approval is only a part.
Once one separates concepts intrinstic "value" from "social reward", one can see that the separated concepts play different roles in alignment.