When is further research needed?

Richard_Kennaway

When is further research needed?

by Richard_Kennaway

3 min read17th Jun 201181 comments

1

Personal Blog

Here's a simple theorem in utility theory that I haven't seen anywhere. Maybe it's standard knowledge, or maybe not.

TL,DR: More information is never a bad thing.

The theorem proved below says that before you make an observation, you cannot expect it to decrease your utility, but you can sometimes expect it to increase your utility. I'm ignoring the cost of obtaining the additional data, and any losses consequential on the time it takes. These are real considerations in any practical situation, but they are not the subject of this note.

First, an example to illustrate the principle. Suppose you are faced with two choices, A and B. One of them is right and one is wrong, and it's very important to make the right choice, because being right will confer some large positive utility U (you get to marry the princess), while the wrong choice will get you -U (eaten by a tiger). However, you're not sure which is the right choice. You estimate that there's a 51% chance that A is right, and 49% that B is right. So, you shut up and multiply, and choose A for an expected utility of 0.02U, right?

Suppose the choice does not have to be made immediately, and that you can do something to get better information about whether A or B is the right choice. Say you can make certain observations which will tell you with 99% certainty which is right. Your prior expectation of your posterior is equal to your prior, so before you make the observation, you expect a 50/98 chance of it telling you that A is right, and 48/98 that B is right.

You make the observation and then choose the course of action it tells you. Whether it says A or B, it's 99% likely to be right, so your expected utility from choosing according to the observation is 0.98U, an increase over not making the observation of 0.96U.

Clearly, you should make the observation. Even though you cannot expect what it will tell you, you can expect to greatly benefit from whatever it tells you.

Now the general case.

Theorem: Every act of observation has, before you make it, a non-negative expected utility.

Proof. Let the set of actions available to an agent be C. For each action c in C, the agent has a probability distribution over possible outcomes. Each outcome has a certain utility. For present purposes it is not necessary to distinguish between outcomes and their utility, so we shall consider the agent to have, for each action c, a probability distribution P_c(u) over utilities u. The expectation value int_u u P_c(u) of that distribution is the prior expected utility of the choice c, and the agent's rational choice, given no other information, is to choose that c which maximises int_u u P_c(u). The resulting utility is max_c int_u u P_c(u).

(I can't be bothered to fiddle with the system for getting mathematics typeset as images. The underscore indicates subscripts, int_x means integral with respect to x, and max_x means the maximum value over all x. Take care to backslash all the underscores if quoting any of this.)

Now suppose the agent makes an observation, with result o. This gives the agent a new probability distribution for each choice c over outcomes: P_c(u|o). It should choose the c that maximises int_u u P_c(u|o).

The agent also has a prior distribution of observations P(o). Before making the observation, the expected distribution of utility returned by doing c after the observation is int_o P(o) P_c(u|o). This is equal to P_c(u), as it should be, by the principle that your prior estimate of your posterior distribution of a variable must coincide with your prior distribution.

We therefore have the following expected utilities. If we choose the action without making the observation, the utility is

max_c int_u u P_c(u)

= max_c int_u u int_o P(o) P_c(u|o)

If we observe, then choose, we get

int_o P(o) max_c int_u u P_c(u|o)

The second of these is always at least as large as the first. Proof:

    max_c int_u u int_o P(o) P_c(u|o)
    = max_c int_o P(o) int_u u P_c(u|o)
    <= max_c int_o P(o) max_c int_u u P_c(u|o)
    = int_o P(o) max_c int_u u P_c(u|o)

ETA: In some cases, a non-zero amount of new information will make zero change to your expected utility. In the original example, suppose that your prior probabilities were 75% for A being right, and 25% for B. You make an additional and rather weak observation which, if it says "choose A" raises your posterior probability for A to 80%, while if it says "choose B", it only diminishes your posterior for A to 60%. In either case you still choose A and your expected utility (prior to actually making the observation) is unchanged.

Or informally, further research is only useful if there is a possibility of it telling you enough to change your mind.

New to LessWrong?

Getting Started