Radical Probabilism [Transcript]

Ben Pace

Are there any other detailed descriptions of what a "Jeffrey update" might look like or how one would perform one?

I think I get the point of there being "rationality constraints" that don't, by implication, strictly require Bayesian updates. But are Jeffrey updates the entire set of possible updates that are required?

Can anyone describe a concrete example contrasting a Bayesian update and a Jeffrey update for the same circumstances, e.g. prior beliefs and new information learned?

It kinda seems like Jeffrey updates are 'possibly rational updates' but they're only justified if one can perform them for no possible (or knowable) reason. That doesn't seem practical – how could that work?

[-]abramdemski5y*Ω7150

Understandable questions. I hope to expand this talk into a post which will explain things more properly.

Think of the two requirements for Bayes updates as forming a 2x2 matrix. If you have both (1) all information you learned can be summarised into one proposition which you learn with 100% confidence, and (2) you know ahead of time how you would respond to that information, then you must perform a Bayesian update. If you have (2) but not (1), ie you update some X to less than 100% confidence but you knew ahead of time how you would update to changed beliefs about X, then you are required to do a Jeffrey update. But if you don't have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.

Jeffrey updates are a simple generalization of Bayes updates. When a Bayesian learns X, they update it to 100%, and take P(Y|X) to be the new P(Y) for all Y. (More formally, we want to update P to get a new probability measure Q. We do so by setting Q(Y)=P(Y|X) for all Y.) Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident. He thought this was more true to human experience. A Jeffrey update is just the weighted average of the two possible Bayesian updates. (More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(Y|~X).)

A natural response for a classical Bayesian is: where does 90% come from? (Where does c come from?) But the Radical Probabilism retort is: where do observations come from? The Bayesian already works in a framework where information comes in from "outside" somehow. The radical probabilist is just working in a more general framework where more general types of evidence can come in from outside.

Pearl argued against this practice in his book introducing Bayesian networks. But he introduced an equivalent -- but more practical -- concept which he calls virtual evidence. The Bayesian intuition freaks out at somehow updating X to 90% without any explanation. But the virtual evidence version is much more intuitive. (Look it up; I think you'll like it better.) I don't think virtual evidence goes against the spirit of Radical Probabilism at all, and in fact if you look at Jeffrey's writing he appears to embrace it. So I hope to give that version in my forthcoming post, and explain why it's nicer than Jeffrey updates in practice.

[-]DanielFilan5yΩ360

More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(~Y|X).

Huh, I'm really surprised this isn't Q(Y) = cP(Y|X) + (1-c)P(Y|~X). Was that a typo? If not, why choose your equation over mine?

[-]abramdemski5yΩ220

Ah, yep! Corrected.

[-]Richard_Kennaway5y20

Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident

How does this differ from a Bayesian update? You can update on a new probability distribution over X just as you can on a point value. In fact, if you're updating the probabilities in a Bayesian network, like you described, then even if the evidence you are updating on is a point value for some initial variable in the graph, the propagation steps will in general be updates on the new probability distributions for parent variables.

[-]Kenny5y10

Thanks! That answers a lot of my questions even without a concrete example.

I found this part of your reply particularly interesting:

if you don't have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.

The abstract example I came up with after reading that was something like 'I think A at 60%. If I observe X, then I'd update to A at 70%. If I observe Y, then I'd update to A at 40%. If I observe Z, I don't know what I'd think.'.

I think what's a little confusing is that I imagined these kinds of adjustments were already incorporated into 'Bayesian reasoning'. Like, for the canonical 'cancer test result' example, we could easily adjust our understanding of 'receives a positive test result' to include uncertainty about the evidence itself, e.g. maybe the test was performed incorrectly or the result was misreported by the lab.

Do the 'same' priors cover our 'base' credence of different types of evidence? How are probabilities reasonably, or practically, assigned or calculated for different types of evidence? (Do we need to further adjust our confidence of those assignment or calculations?)

Maybe I do still need a concrete example to reach a decent understanding.

[-]abramdemski5y50

Richard Bradley gives an example of a non-Bayes non-Jeffrey update in Radical Probabilism and Bayesian Conditioning. He calls his third type of update Adams conditioning. But he goes even further, giving an example which is not Bayes, Jeffrey, or Adams (the example with the pipes toward the end; figure 1 and accompanying text). To be honest I still find the example a bit baffling, because I'm not clear on why we're allowed to predictably violate the rigidity constraint in the case he considers.

I think what’s a little confusing is that I imagined these kinds of adjustments were already incorporated into ‘Bayesian reasoning’. Like, for the canonical ‘cancer test result’ example, we could easily adjust our understanding of ‘receives a positive test result’ to include uncertainty about the evidence itself, e.g. maybe the test was performed incorrectly or the result was misreported by the lab.

We can always invent a classically-bayesian scenario where we're uncertain about some particular X, by making it so we can't directly observe X, but rather get some other observations. EG, if we can't directly observe the test results but we're told about it through a fallible line of communication. What's radical about Jeffrey's view is to allow the observations themselves to be uncertain. So if you look at e.g. a color but aren't sure what you're looking at, you don't have to contrive a color-like proposition which you do observe in order to record your imperfect observation of color.

You can think of radical probabilism as "Bayesianism at a distance": like if you were watching a Bayesian agent, but couldn't bother to record every single little sense-datum. You want to record that the test results are probably positive, without recording your actual observations that make you think that. We can always posit underlying observations which make the radical-probabilist agent classically Bayesian. Think of Jeffrey as pointing out that it's often easier to work "at a distance" instead, and than once you start thinking this way, you can see it's closer to your conscious experience anyway -- so why posit underlying propositions which make all your updates into Bayes updates?

As for me, I have no problem with supposing the existence of such underlying propositions (I'll be making a post elaborating on that at some point...) but find radical probabilism to nonetheless be a very philosophically significant point.

[-]Kenny5y10

Thanks again!

Your point about "Bayesianism at a distance" makes a lot of sense.

[-]David Scott Krueger (formerly: capybaralet)5y*Ω350

Abram Demski: But it's like, how do you do that if “I don't have a good hypothesis” doesn't make any predictions?

One way you can imagine this working is that you treat “I don't have a good hypothesis” as a special hypothesis that is not required to normalize to 1.
For instance, it could say that observing any particular real number, r, has probability epsilon > 0.
So now it "makes predictions", but this doesn't just collapse to including another hypothesis and using Bayes rule.

You can also imagine updating this special hypothesis (which I called a "Socratic hypothesis" in comments on the original blog post on Radical Probabilism) in various ways.

[-]philh5y40

(Transcription nitpick: IIRC I said "fewer constraints", not "pure constraints".)

[-]Raemon5y20

A background question I've had for a while: people often use Dutch Booking as an example of a failure mode you need your rationality-theory to avoid. Dutch Booking seems like a crisp, formalizable circumstance that makes it easy to think about some problems, but I'm not sure it ever comes up for me. Most people seem to avoid it via "don't make big bets often", rather than "make sure your beliefs are rational and inexploitable."

Is Dutch Book supposed to be a metaphor for something that happens more frequently?

[-]abramdemski5y60

Yeah, the position in academic philosophy as I understand it is: Dutch book arguments aren't really about betting. It's not actually that we're so concerned about bets. Rather, it's a way to illustrate a kind of inconsistency. At first when I heard this I was kind of miffed about it, but now, I think it's the right idea. I suggest reading the SEP article on Dutch Book arguments, especially Section 1.4 (which voices your concerns) and Section 2.1 or section 2 as a whole (which addresses your concerns in the way I've outlined).

Note, however, that we might insist that the meaning of probability is as a guide for actions, and hence, "by definition" we should take bets when they have positive expectation according to our probabilities. If we buy this, then either (1) you're being irrational in rejecting those bets, or (2) you aren't really reporting your probabilities in the technical sense of what-guides-your-actions, but rather some subjective assessments which may somehow be related to your true probabilities.

But if you want this kind of "fully pragmatic" notion of probability, a better place to start might be the Complete Class Theorem, which really is a consequentialist argument for having a probability distribution, unlike Dutch Books.

LESSWRONG
LW

LESSWRONG
LW

48

Radical Probabilism [Transcript]

48

Ω 18

48

Ω 18

Talk

Q&A