Khoth_duplicate0.7536578752539433

I think there are two ways that a reward function can be applicable:

1) For making moral judgements about how you should treat your agent. Probably irrelevant for your button presser unless you're a panpsychist.

2) If the way your agent works is by predicting the consequences of its actions and attempting to pick an action that maximises some reward (eg a chess computer trying to maximise its board valuation function). Your agent H as described doesn't work this way, although as you note there are agents which do act this way and produce the same behaviour as your H.

There's also the kind-of option:

3) Anything can be modelled as if it had a utility function, in the same way that any solar system can be modelled as a geocentric one with enough epicycles. In this case there's no "true" reward function, just "the reward function that makes the maths I want to do as easy as possible". Which one that is depends on what you're trying to do, and maybe pretending there's a reward function isn't actually better than using H's true non-reward-based algorithm.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments