When is further research needed?

by RichardKennaway4 min read17th Jun 201181 comments


Personal Blog

Here's a simple theorem in utility theory that I haven't seen anywhere. Maybe it's standard knowledge, or maybe not.

TL,DR: More information is never a bad thing.

The theorem proved below says that before you make an observation, you cannot expect it to decrease your utility, but you can sometimes expect it to increase your utility.  I'm ignoring the cost of obtaining the additional data, and any losses consequential on the time it takes. These are real considerations in any practical situation, but they are not the subject of this note.

First, an example to illustrate the principle. Suppose you are faced with two choices, A and B. One of them is right and one is wrong, and it's very important to make the right choice, because being right will confer some large positive utility U (you get to marry the princess), while the wrong choice will get you -U (eaten by a tiger). However, you're not sure which is the right choice.  You estimate that there's a 51% chance that A is right, and 49% that B is right.  So, you shut up and multiply, and choose A for an expected utility of 0.02U, right?

Suppose the choice does not have to be made immediately, and that you can do something to get better information about whether A or B is the right choice. Say you can make certain observations which will tell you with 99% certainty which is right. Your prior expectation of your posterior is equal to your prior, so before you make the observation, you expect a 50/98 chance of it telling you that A is right, and 48/98 that B is right.

You make the observation and then choose the course of action it tells you.  Whether it says A or B, it's 99% likely to be right, so your expected utility from choosing according to the observation is 0.98U, an increase over not making the observation of 0.96U.

Clearly, you should make the observation. Even though you cannot expect what it will tell you, you can expect to greatly benefit from whatever it tells you.

Now the general case.

Theorem: Every act of observation has, before you make it, a non-negative expected utility.

Proof.  Let the set of actions available to an agent be C.  For each action c in C, the agent has a probability distribution over possible outcomes.  Each outcome has a certain utility.  For present purposes it is not necessary to distinguish between outcomes and their utility, so we shall consider the agent to have, for each action c, a probability distribution P_c(u) over utilities u.  The expectation value int_u u P_c(u) of that distribution is the prior expected utility of the choice c, and the agent's rational choice, given no other information, is to choose that c which maximises int_u u P_c(u).  The resulting utility is max_c int_u u P_c(u).

(I can't be bothered to fiddle with the system for getting mathematics typeset as images.  The underscore indicates subscripts, int_x means integral with respect to x, and max_x means the maximum value over all x.  Take care to backslash all the underscores if quoting any of this.)

Now suppose the agent makes an observation, with result o.  This gives the agent a new probability distribution for each choice c over outcomes: P_c(u|o). It should choose the c that maximises int_u u P_c(u|o).

The agent also has a prior distribution of observations P(o).  Before making the observation, the expected distribution of utility returned by doing c after the observation is int_o P(o) P_c(u|o).  This is equal to P_c(u), as it should be, by the principle that your prior estimate of your posterior distribution of a variable must coincide with your prior distribution.

We therefore have the following expected utilities.  If we choose the action without making the observation, the utility is

    max_c int_u u P_c(u)

    = max_c int_u u int_o P(o) P_c(u|o)

If we observe, then choose, we get

    int_o P(o) max_c int_u u P_c(u|o)

The second of these is always at least as large as the first.  Proof:

    max_c int_u u int_o P(o) P_c(u|o)
    =  max_c int_o P(o) int_u u P_c(u|o)
    <= max_c int_o P(o) max_c int_u u P_c(u|o)
    =  int_o P(o) max_c int_u u P_c(u|o)

ETA: In some cases, a non-zero amount of new information will make zero change to your expected utility. In the original example, suppose that your prior probabilities were 75% for A being right, and 25% for B. You make an additional and rather weak observation which, if it says "choose A" raises your posterior probability for A to 80%, while if it says "choose B", it only diminishes your posterior for A to 60%. In either case you still choose A and your expected utility (prior to actually making the observation) is unchanged.

Or informally, further research is only useful if there is a possibility of it telling you enough to change your mind.


81 comments, sorted by Highlighting new comments since Today at 1:21 PM
New Comment
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

TL,DR: More information is never a bad thing.

As certain wise Paperclip Optimizer once said, information that someone is blackmailing you is bad. You're better off not having this information because it makes you blackmail-proof.

All your analysis gets thrown out of the window in case of signaling, game theory etc. There are probably a lot more other cases where it doesn't work.

As certain wise Paperclip Optimizer once said, information that someone is blackmailing you is bad.

Actually, no it isn't. What is bad for you is for the blackmailer to learn that you are aware of the blackmail.

Acquiring information is never bad, in and of itself. Allowing others to gain information can be bad for you. Speaking as an egoist, that is.

ETA: I now notice that gjm already made this point.

0taw10yThis seems incorrect. It doesn't really matter for blackmailer if you're aware of the blackmail or not, what matters is his estimate of the chance than you know. Blackmailing is profitable if gain from successful blackmail chance you'll know about it chance you'll give in > cost of blackmail. Unless you can guarantee 100% solid precommitment to not giving in to blackmail (and let's face it - friendly AI is easier than that), the more you increase the chance of knowing about it, the more blackmailing you'll face.
0timtyler10yThat idea is usually regarded as being incorrect around here - e.g. see here [http://www.nickbostrom.com/information-hazards.pdf]. For instance, the document states that one example is "to measure the placebo effect". In that case, if you find out what treatment you actually got, that messes up the trial, and you have to start all over again. There is a more defensible idea that accquiring accurate information is not ever bad - if you are a super-rational uber-agent, who is able to lie flawlessly, erase information perfectly, etc. However, that is counter-factual. If you are a human, in practice, acquiring accurate information can harm you - and of course acquiring deceptive or inaccurate information can really cause problems.
-2Will_Newsome10yUnless there's a placebo effect placebo effect! Seriously, I think I've experienced that. (I'll take a pill and immediately feel better because I think that the placebo effect will make me feel better.) But maybe it's too hard to disentangle. I continue to think that I am blatantly crazy for continuing to not find out how strong placebo effects tend to be and what big factors affect that.
3Clippy10yI said that the information can be bad, depending on what strategies you have access to. If you can identify and implement the strategy of ignoring all blackmail/extortion attempts (or, possibly, pre-commit to mutually assured destruction), then learning of an existing blackmail attempt against yourself does not make you worse off. I don't know how dependent User:RichardKennaway's theorem was dependent on this nuance, but your claim is only conditionally true. Also, I'm a paperclip maximiser, not an optimizer; any optimization of paperclips that I might perform is merely a result of my attempt to maixmise them, and such optimality is only judged with respect to whether it can permit more real paperclips to exist.
0[anonymous]10yOut of curiosity, what are the minimum dimensions of a paperclip? Is a collection of molecules still a paperclip if the only paper it can clip is on the order of a molecule thick?
1Clippy10yI think I need to post a Clippy FAQ. Will the LessWrong wiki be OK? Once again, the paperclip must be able (counterfactually) to fasten several sheets together, and they must be standard thickness paper, not some newly invented special paper. I understand that that specification doesn't completely remove ambiguity about minimum paperclip mass, and there are certainly "edge cases", but that should answer your questions about what is clearly not good enough.
1NancyLebovitz10yPossibly a nitpick, but very thin paper [http://en.wikipedia.org/wiki/Onionskin] has been around for a while.
0AdeleneDawner10yIf you have an account on the wiki, you have the option of setting up a user page (for example, user:Eliezer_Yudkowsky has one here [http://wiki.lesswrong.com/wiki/User:Eliezer_Yudkowsky]). It should be okay for you to put a Clippy FAQ of reasonable length on yours.
1Clippy10yHi User:AdeleneDawner I put up some of the FAQ on my page [http://wiki.lesswrong.com/wiki/User:Clippy].
1Clippy10yThanks. I had already started a Wiki userpage (and made it my profile's home page), I just didn't know if it would be human-acceptable to add the Clippy FAQ to it. Right now the page only has my private key.
0Alicorn10yDoes it count if the paper started out as standard thickness, but through repeated erasure, has become thinner?
1Clippy10yPaperclips are judged by counterfactual fastening of standard paper, so they are not judged by their performance against such heavily-erased-over paper. Such a sheet would, in any case, not adhere to standard paper specs, and so a paperclip could not claim credit for clippiness due to its counterfactual ability to fasten such substandard paper together.
0Pavitra10yThis seems to imply that if an alleged paperclip can fasten standard paper but not eraser-thinned paper, possibly due to inferior tightness of the clamp, then this object would qualify as a paperclip. This seems counterintuitive to me, as such a clip would be less useful for the usual design purpose of paperclips.
3Clippy10yA real paperclip is one that can fasten standard paper, which makes up most of the paper for which a human requester would want a paperclip. If a paperclip could handle that usagespace but not that of over-erased paper, it's not much of a loss of paperclip functionality, and therefore doesn't count as insufficient clippiness. Certainly, paperclips could be made so that they could definitely fasten both standard and substandard paper together, but it would require more resources to satisfy this unnecessary task, and so would be wasteful.
0Pavitra10yDoesn't extended clippability increase the clippiness, so that a very slightly more expensive-to-manufacture clip might be worth producing?
0Clippy10yNo, that's a misconception.
0taw10yAvoiding all such knowledge is a perfect precommitment strategy. It's hard to come up with better strategies than that, and even if your alternative strategy is sound blackmailer might very well not believe it and give it a try (if he can get you to know it, then are you really perfectly consistent?). If you can guarantee you won't even know, there's no point in even trying to blackmail you and this is obvious to even a very dumb blackmailer. By the way, are there lower and upper bounds on number of paperclips in the universe? Is it possible for universe to have negative number of paperclips somehow. Or more paperclips than its numbers of atoms? Is this risk-neutral? (1% chance of 100 paperclips exactly as valuable as 1 paperclip?). I've been trying to get humans to describe their utility function to me, but they can never come with anything consistent, so I though I'd ask you this time.
2Clippy10yNot plausible: it would necessarily entail you avoiding "good" knowledge. More generally, a decision theory that can be hurt by knowledge is one that you will want to abandon in favor of a better decision theory and is reflectively inconsistent. The example you gave would involve you cutting yourself off from significant good knowledge. Mass of the universe divided by minimum mass of a true paperclip, minus net unreusable overhead. Up to the level of precision we can handle, yes.
0taw10yHumans are just amazing at refusing to acknowledge existence of evidence. Try throwing some evidence of faith healing or homeopathy at an average lesswronger, and see how they come with refusal to acknowledge its existence before even looking at data (or how they recently reacted to peer-reviewed statistically significant results showing precognition - it passed all scientific standards, and yet everyone still refused it without really looking at data). Every human seems to have some basic patterns of information they automatically ignore. Not believing offers from blackmailers and automatically thinking they'd do what they threat anyway is one of such common filters. It's true that humans cut themselves from a significant good this way, but upside is worth it. Any idea what it would be? It makes little sense to manufacture a few big paperclips if you can just as easily manufacture a lot more tiny paperclips if they're just as good.
0Clippy10yAnd those humans would be the reflectively inconsistent ones. Not as judged from the standpoint of reflective equilibrium. I already make small paperclips in preference to larger ones (up to the limit of clippiambiguity).
0taw10yWait, you didn't know that humans are inherently inconsistent and use aggressive compartmentalization mechanisms to think effectively in presence of inconsistency, ambiguity of data, and limited computational resources? No wonder you get into so many misunderstandings with humans.
2RichardKennaway10ySee the long version. Obviously, once you have the information, it may turn out to be an unpleasant surprise. The analysis is concerned with your prior expectation.

No, that isn't what taw is saying. The point is that having more information and being known to have it can be extremely bad for you. This is not a counterexample to the theorem, which considers two scenarios whose only difference is in how much you know, but in real-life applications that's very frequently not the case.

I don't think taw's blackmail example is quite right as it stands, but here's a slight variant that is. A Simple Blackmailer will publish the pictures if you don't give him the money. Obviously if there is such a person, and if there are no further future consequences, and if you prefer losing the money to losing your reputation, it is better for you to know about the blackmailer so you can give him the money. But now consider a Clever Blackmailer, who will publish the pictures if you don't give him the money and if he thinks you might give him the money if he doesn't. If there's a Clever Blackmailer and you don't know it (and he knows you don't know it) then he won't bother publishing because the threat has no force for you -- since you don't even know there is one. But if you learn of his existence and he knows this then he will publish the pictures unless you give him the money, so you have to give him the money. So, in this situation, you lose by discovering his existence. But only because he knows that you've discovered it.

-4RichardKennaway10yThe theorem says what it says. Either there is an error in the proof, in which case taw can point it out, or these objections are outside its scope, and irrelevant.

I am unsure of what the point of posting this theorem was. Yes, it holds as stated, but it seems to have very little applicability to the real world. Your tl;dr version is "More information is never a bad thing", but that is clearly false if we're talking about real people making real decisions.

3RichardKennaway10yThe same is true, mutatis mutandis, of Aumann's agreement theorem. Little applicability to the real world, and the standard tl;dr version "rational agents cannot agree to disagree" is clearly false if etc.
7JoshuaZ10yYes, and not at all coincidentally, some people here (e.g. me) have argued that one shouldn't use Aumann's theorem and related results as anything other than a philosophical argument for Bayesianism and that trying to use it in practical contexts rarely makes sense.
2Kaj_Sotala10yThe same is also true about any number of obscure mathematical theorems which nevertheless don't get posted here. That doesn't help clarify what makes this result interesting.
5RichardKennaway10yHere are three theorems about Bayesian reasoning and utility theory: 1. Your prior expectation of your posterior expectation is equal to your prior expectation. 2. Your prior expectation of your posterior expected utility is not less than your prior expected utility. 3. Two people with common priors and common knowledge of their posteriors cannot disagree. ETA: 4. P(A&B) <= P(A) [http://lesswrong.com/lw/ji/conjunction_fallacy/]. In all these cases: 1. The mathematical content borders on trivial. 2. They are theorems -- you cannot avoid the conclusions if you accept the premises. 3. Real people often violate the conclusions. Real people will expect an experiment to update their beliefs in a certain direction, they will refuse to perform an observation on the grounds that they'd rather not know, and they persistently disagree on many things. There are many responses one can make to this situation: disputing whether Bayesian utility-maximisation is the touchstone of rational behaviour, disputing whether imperfectly rational people can come anywhere near the ideal implied by these theorems, and so on. (For example. [http://lesswrong.com/lw/1qk/applying_utility_functions_to_humans_considered/]) But whatever your response, these theorems demand one. For those attempting to build an AGI on the principle of Bayesian utility-maximisation, these theorems say that it must behave in certain ways. If it does not behave in accordance with their conclusions, then it has violated their hypotheses. This, to me, is what makes these theorems interesting, and their simplicity and obviousness enhance that.
0Kaj_Sotala10yThanks, that clarifies things. (I would personally not put this in the same category in interestingness as Aumann's disagreement. It seems like the reasons why Aumann doesn't apply in real life are far less obvious than the reasons for why this theorem doesn't. But that's just me - I get your reasoning now.)
-2Will_Sawin10ySuppose I consider whether to blackmail you. I do not have the ability to prove that I have the means to do so. You thereby would elect not to give me what I want - you're willing to take the risk. So I don't blackmail you. If I gained the ability to prove that I have the means to do so, you would gain nothing if I didn't have the means, but lose if I did have them, because you would now be blackmailed and forced to give me stuff.
0CuSithBell10yFor instance, someone is providing you with information about where the princess is... but they secretly prefer that you be eaten rather than wed another!
0RichardKennaway10yIt is explicit in the hypotheses that you know how reliable your observations are, i.e. you know P_c(u|o).
0[anonymous]10yIt is explicitly stated in the hypotheses that you know how reliable your observations are.
0CuSithBell10yWhere? I see
0RichardKennaway10yIt's always a good idea to read below the fold before commenting, an example of more information being a good thing. (BTW, my deleted comment was a draft I had second thoughts about, then decided was right anyway and reposted here [http://lesswrong.com/lw/694/when_is_further_research_needed/4d8i].) P_c(u|o) is assumed to be known to the agent.
0CuSithBell10yNo need to be snide. I think the description of your theorem, as written above, is false. What conditions need to hold before it becomes true?
0RichardKennaway10yI think it is true. I don't see whatever problem you see.
-3CuSithBell10yAs you indicated, the information assumed in the proof is not assumed in your gloss. Perhaps it should read something like, "the expected difference in the expected value of a choice upon learning information about the choice, when you are aware of the reliability of the information, is non-negative," but pithier? Because it seems that if I have a lottery ticket with a 1-in-1000000 chance of paying out $1000000, before I check whether I won, going to redeem it has an expected value of $1, but I expect that if I check whether I have won, this value will decrease.
3RichardKennaway10y"The prior expected value of new information is non-negative." But summaries leave out details. That is what makes them summaries.

Because of all the simplifying assumptions, the theorem proved in the post has no bearing on the question posed in the title.

Here's the intuitive version:

Consider the set of all strategies, that is, functions from {possible sequences of observations} => {possible actions}

Each strategy has an expected utility.

Adding more information gets you more strategies, because all the old ones are still viable - you just ignore the new observation - and some additional strategies are viable.

Adding more options is never bad. (because the maximum of AuB is at least as big as the maximum of A)

0Will_Sawin10yWhy was this downvoted?
5Desrtopa10yI didn't downvote, or read the comment until just now for that matter, but perhaps someone had harmful options [http://lesswrong.com/lw/x2/harmful_options/] in mind.
2Will_Sawin10yReviewing my post and the OP I realize it was never technically stated that the result only holds for idealized rationalists. But of course that was implied. I don't THINK that's it, but it might have been.

In the example you choose it is blatantly intuitively obvious that making the observation has high expected utility, so its use as an intuition pump is minimal. Perhaps it would be better to find an example where it's not as immediately obvious?

0RichardKennaway10yMaybe, but I'll let it stand. I've added a related example at the end though, to make a different point.

Counter-example: http://web.archive.org/web/20090415130842/http://www.weidai.com/smart-losers.txt

Seems to me the proof does not go through because it only consider actions taken by the agent.

4Perplexed10yQuoting from the linked example: I would say that the proof still goes through. Receiving information cannot hurt you. But if other agents acquire information that you have acquired information - well, that can hurt you. Politicians instinctively know this, and hence seek "plausible deniability" [http://en.wikipedia.org/wiki/Plausible_deniability].
0soreff10yDoes the "blind carbon copy" feature in email count as a minimal example of "deniability engineering"? :)

When the current grant money runs out.

Or in other words, the expectation of a max of some random variables is always greater or equal to the max of the expectations.

You could call this 'standard knowledge' but it's not the kind of thing one bothers to commit to memory. Rather, one immediately perceives it as true.

1Will_Sawin10y"one" is not general enough. Do you really think what you just said is true for all people?
0AlephNeil10yIt's true for anyone who understands random variables and expectations. There's a one line proof, after all.
0RichardKennaway10yMany things are obvious when they have been pointed out.
0PhilGoetz10ySome people are criticizing this for being obviously true; others are criticizing it for being false. A particular agent can have wrong information, and make a poor decision as a result of combining the wrong information with the new information. Since we're assuming that the additional information is correct, I think it's reasonable to also stipulate that all previous information is correct. Also, you need to state the English interpretation in terms of expected value, not as "More information is never a bad thing".
7AlephNeil10yThe mathematical result is trivial, but its interpretation as the practical advice "obtaining further information is always good" is problematic, for the reason taw points out. Actually, I thought of that objection myself, but decided against writing it down. First of all, it's not quite right to refer to past information as 'right' or 'wrong' because information doesn't arrive in the form of propositions-whose-truth-is-assumed, but in the form of sense data.* It's better to talk about 'misleading information' rather than 'wrong information'. When adversary A tells you P, which is a lie, your information is not P but "A told me P". (Actually, it's not even that, but you get the idea.) If you don't know A is an adversary then "A told me P" is misleading, but not wrong. Now, suppose the agent's prior has got to where it is due to the arrival of misleading information. Then relative to that prior, the agent still increases its expected utility whenever it acquires new data (ignoring taw's objection). (On the other hand, if we're measuring expectations wrt the knowledge of some better informed agent then yes, acquiring information can decrease expected utility. This is for the same reason that, in a Gettier case, learning a new true and relevant fact (e.g. most nearby barn facades are fake) can cause you to abandon a true belief in favour of a false one.) * Yes yes, I know statements like this are philosophically contentious, but within LW they're assumptions to work from rather than be debated.
0CuSithBell10yThat meets the criterion of "pithier", certainly.

TL,DR: More information is never a bad thing.

The average American who has never been to a hockey game could probably do better at naming someone who co-holds the record for the most combined points by brothers in the National Hockey League than the average person who is a casual fan and has been to one or two games.

Not the Sedins, not the Sutters...

0AdeleneDawner10yI suspect you're wrong; I expect the average American who's never been to a hockey game to not have the first clue about this, to the point of basically not being able to guess at all. Certain biases might lead a casual fan to regularly guess certain wrong answers as a first attempt, but I expect that a casual fan given, say, 10 opportunities to guess would come up with a right answer with some reasonable, if small, probability, whereas a non-fan would probably do no better than guessing which names are common in the population in general.
0lessdazed10yThis can be tested! Kidnap people, place them in a room with a slit in its door. In the room is a magic marker and a slip of paper. They have however long they want to write a name of a co-holder of the NHL record for most points by brothers and slip the paper with that name written on it through the slit, and if they get it right on their first and only try, they get to leave the room. I predict a valley of incorrect answers between the higher performances of the clued in and the clueless.

You are assuming that the observation has no error margin.

Lets suppose that the priors are 51%A and 49%B and then your new observation says "55%A and 45%B" So - automatically you'd round your A-value up a little right?

but very few observations are going to be 100% accurate. Lets say this one has an error rate of 10% so actually it could be only 50%A and 50%B, but has given you a false positive of 55%A

Are you better off? or have you just introduced more error into your estimations?

0RichardKennaway10yThe observation is here defined by its effect on one's probability distribution over utilities of outcomes. In this sense, the possibility of observational error is already included.
0taryneast10yOk - then I don't understand it well enough.

This doesn't take into account the potential utility cost of acquiring the information.

7Manfred10yHe says this exact thing near the start of the article.
1Morendil10yThanks. Wasn't paying attention.