Can you help find the most intuitive example of reward function learning?

In reward function learning, there is a set of possible non-negative reward functions, , and a learning process which takes in a history of actions and observations and returns a probability distribution over .

If is a policy, is the set of histories of length , and is the probability of given that the agent follows policy , the expected value of at horizon is:

where is the total -reward over the history