Tom Price

Message

7mo

Tom Price has not written any posts yet.

Replying toModels Don't "Get Reward"

Tom Price7mo

Models Don't "Get Reward"

Could you give some examples?

Replying toReward is not the optimization target

Tom Price7mo*

Reward is not the optimization target

reward chisels cognitive grooves into an agent

This makes sense, but if the agent is smart enough to know how it *could* wirehead, perhaps wireheading would eventually result from the chiseling of some highly abstract grooves.

To give an example, suppose you go to Domino's pizza on Saturday at 6pm and eat some Hawaiian pizza. You enjoy the pizza. This reinforces the behaviour of "Go to Domino's pizza on Saturday at 6pm and eat some Hawaiian pizza".

Surely this will also reinforce other more generic behaviours, that include this behaviour as a special case, such as:

"Go to a pizza place in the evening and eat pizza."

"Go to a restaurant and eat yummy food."

Well then, why... (read more)

LESSWRONG
LW

LESSWRONG
LW

Tom Price

Tom Price

Tom Price

Tom Price