What are your cruxes for imprecise probabilities / decision rules?

6Kaarel

2Anthony DiGiovanni

1Kaarel

4JBlack

4RHollerith

1Anthony DiGiovanni

3Dagon

9johnswentworth

3Anthony DiGiovanni

2johnswentworth

2Anthony DiGiovanni

2Raemon

4Anthony DiGiovanni

New Answer

New Comment

4 Answers sorted by

Here are some brief reasons why I dislike things like imprecise probabilities and maximality rules (somewhat strongly stated, medium-strongly held because I've thought a significant amount about this kind of thing, but unfortunately quite sloppily justified in this comment; also, sorry if some things below approach being insufficiently on-topic):

- I like the canonical arguments for bayesian expected utility maximization ( https://www.alignmentforum.org/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations ; also https://web.stanford.edu/~hammond/conseqFounds.pdf seems cool (though I haven't read it properly)). I've never seen anything remotely close for any of this other stuff — in particular, no arguments that pin down any other kind of rule compellingly. (I associate with this the vibe here (in particular, the paragraph starting with "To the extent that the outer optimizer" and the paragraph after it), though I guess maybe that's not a super helpful thing to say.)
- The arguments I've come across for these other rules look like pointing at some intuitive desiderata and saying these other rules sorta meet these desiderata whereas canonical bayesian expected utility maximization doesn't, but I usually don't really buy the desiderata and/or find that bayesian expected utility maximization also sorta has those desired properties, e.g. if one takes the cost of thinking into account in the calculation, or thinks of oneself as choosing a policy.
- When specifying alternative rules, people often talk about things like default actions, permissibility, and preferential gaps, and these concepts seem bad to me. More precisely, they seem unnatural/unprincipled/confused/[I have a hard time imagining what they could concretely cache out to that would make the rule seem non-silly/useful]. For some rules, I think that while they might be psychologically different than 'thinking like an expected utility maximizer', they give behavior from the same distribution — e.g., I'm pretty sure the rule suggested here (the paragraph starting with "More generally") and here (and probably elsewhere) is equivalent to "act consistently with being an expected utility maximizer", which seems quite unhelpful if we're concerned with getting a differently-behaving agent. (In fact, it seems likely to me that a rule which gives behavior consistent with expected utility maximization basically had to be provided in this setup given https://web.stanford.edu/~hammond/conseqFounds.pdf or some other canonical such argument, maybe with some adaptations, but I haven't thought this through super carefully.) (A bunch of other people (Charlie Steiner, Lucius Bushnaq, probably others) make this point in the comments on https://www.lesswrong.com/posts/yCuzmCsE86BTu9PfA/there-are-no-coherence-theorems; I'm aware there are counterarguments there by Elliott Thornley and others; I recall not finding them compelling on an earlier pass through these comments; anyway, I won't do this discussion justice in this comment.)
- I think that if you try to get any meaningful mileage out of the maximality rule (in the sense that you want to "get away with knowing meaningfully less about the probability distribution"), basically everything becomes permissible, which seems highly undesirable. This is analogous to: as soon as you try to get any meaningful mileage out of a maximin (infrabayesian) decision rule, every action looks really bad — your decision comes down to picking the least catastrophic option out of options that all look completely catastrophic to you — which seems undesirable. It is also analogous to trying to find an action that does something or that has a low probability of causing harm 'regardless of what the world is like' being imo completely impossible (leading to complete paralysis) as soon as one tries to get any mileage out of 'regardless of what the world is like' (I think this kind of thing is sometimes e.g. used in davidad's and Bengio's plans https://www.lesswrong.com/posts/pKSmEkSQJsCSTK6nH/an-open-agency-architecture-for-safe-transformative-ai?commentId=ZuWsoXApJqD4PwfXr , https://www.youtube.com/watch?v=31eO_KfkjRQ&t=1946s ). In summary, my inside view says this kind of knightian thing is a complete non-starter. But outside-view, I'd guess that at least some people that like infrabayesianism have some response to this which would make me view it at least slightly more favorably. (Well, I've only stated the claim and not really provided the argument I have in mind, but that would take a few paragraphs I guess, and I won't provide it in this comment.)
- To add: it seems basically confused to talk about
**the**probability distribution on probabilities or probability distributions, as opposed to some joint distribution on two variables or**a**probability distribution on probability distributions or something. It seems similarly 'philosophically problematic' to talk about**the**set of probability distributions, to decide in a way that depends a lot on how uncertainty gets 'partitioned' into the set vs the distributions. (I wrote about this kind of thing a bit more here: https://forum.effectivealtruism.org/posts/Z7r83zrSXcis6ymKo/dissolving-ai-risk-parameter-uncertainty-in-ai-future#vJg6BPpsG93iyd7zo .) - I think it's plausible there's some (as-of-yet-undeveloped) good version of probabilistic thinking+decision-making for less-than-ideal agents that departs from canonical bayesian expected utility maximization; I like approaches to finding such a thing that take aspects of existing messy real-life (probabilistic) thinking seriously but also aim to define a precise formal setup in which some optimality result could be proved. I have some very preliminary thoughts on this and a feeling that it won't look at all like the stuff I've discussed disliking above. Logical induction ( https://arxiv.org/abs/1609.03543 ) seems cool; a heuristic estimator ( https://arxiv.org/pdf/2211.06738 ) would be cool. That said, I also assign significant probability to nothing very nice being possible here (this vaguely relates to the claim: "while there's a single ideal rationality, there are many meaningfully distinct bounded rationalities" (I'm forgetting whom I should attribute this to)).

Thanks for the detailed answer! I won't have time to respond to everything here, but:

I like the canonical arguments for bayesian expected utility maximization (

https://www.alignmentforum.org/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations; alsohttps://web.stanford.edu/~hammond/conseqFounds.pdfseems cool (though I haven't read it properly)). I've never seen anything remotely close for any of this other stuff

But the CCT only says that if you satisfy [blah], your policy is consistent* *with precise EV maximization. This do...

1

I agree that any precise EV maximization (which imo = any good policy) is consistent with some corresponding maximality rule — in particular, with the maximality rule with the very same single precise probability distribution and the same utility function (at least modulo some reasonable assumptions about what 'permissibility' means). Any good policy is also consistent with any maximality rule that includes its probability distribution as one distribution in the set (because this guarantees that the best-according-to-the-precise-EV-maximization action is always permitted), as well as with any maximality rule that makes anything permissible. But I don't see how any of this connects much to whether there is a positive case for precise EV maximization? If you buy the CCT's assumptions, then you literally do have an argument that anything other than precise EV maximization is bad, right, which does sound like a positive case for precise EV maximization (though not directly in the psychological sense)?
Ok, maybe you're saying that the CCT doesn't obviously provide an argument for it being good to restructure your thinking into literally maintaining some huge probability distribution on 'outcomes' and explicitly maintaining some function from outcomes to the reals and explicitly picking actions such that the utility conditional on these actions having been taken by you is high (or whatever)? I agree that trying to do this very literally is a bad idea, eg because you can't fit all possible worlds (or even just one world) in your head, eg because you don't know likelihoods given hypotheses as you're not logically omniscient, eg because there are difficulties with finding yourself in the world, etc — when taken super literally, the whole shebang isn't compatible with the kinds of good reasoning we actually can do and do do and want to do. I should say that I didn't really track the distinction between the psychological and behavioral question carefully in my original respon

Sets of distributions are the natural elements of Bayesian reasoning: each distribution corresponds to a hypothesis. Some people pretend that you can collapse these down to a single distribution by some prior (and then argue about "correct" priors), but the actual machinery of Bayesian reasoning produces changes in *relative* hypothesis weightings. Those can be applied to any prior if you have reason to prefer a single one, or simply composed with future relative changes if you don't.

Partially ordering options by EV over all hypotheses is likely to be a very weak order with nearly all options being incomparable (and thus permissible). However, it's quite reasonable to have *bounds* on hypothesis weightings even if you don't have good reason to choose a specific prior.

You can use prior bounds to form very much stronger partial orders in many cases.

My initial impulse is to treat imprecise probabilities like I treat probability distributions over probabilities: namely, I am not permanently opposed, but have promised myself that before I resort to one, I would first try a probability and a set of "indications" about how "sensitive" my probability is to changes: e.g., I would try something like

My probability is .8, but with p = .5, it would change by at least a factor of 2 (more precisely, my posterior odds would end up outside the interval [.5,2] * my prior odds) if I were to spend 8 hours pondering the question in front of a computer with an internet connection; also with p = .25, my probability a year in the future will differ from my current probability by at least a factor of 2 even if I never set aside any time to ponder the question.

I agree that higher-order probabilities can be useful for representing (non-)resilience of your beliefs. But imprecise probabilities go further than that — the idea is that you just don't know what higher-order probabilities over the first-order ones you ought to endorse, or the higher-higher-order probablities over those, etc. So the first-order probabilities remain imprecise.

For humans (and probably generally for embedded agents), I endorse acknowledging that probabilities are a wrong but useful model. For any given prediction, the possibility set is incomplete, and the weights are only estimations with lots of variance. I don't think that a set of distributions fixes this, though in some cases it can capture the model variance better than a single summary can.

EV maximization can only ever be an estimate. No matter HOW you come up with your probabilities and beliefs about value-of-outcome, you'll be wrong fairly often. But that doesn't make it useless - there's no better legible framework I know of. Illegible frameworks (heuristics embedded in the giant neural network in your head) are ALSO useful, and IMO best results come from blending intuition and calculation, and from being humble and suspicious when they diverge greatly.

A couple years ago, my answer would have been that both imprecise probabilities and maximality seem like ad-hoc, unmotivated methods which add complexity to Bayesian reasoning for no particularly compelling reason.

I was eventually convinced that they are useful and natural, specifically in the case where the environment contains an adversary (or the agent in question models the environment as containing an adversary, e.g. to obtain worst-case bounds). I now think of that use-case as the main motivation for the infra-Bayes framework, which uses imprecise probabilities and maximization as central tools. More generally, the infra-Bayes approach is probably useful for environments containing other agents.

Thanks! Can you say a bit on why you find the kinds of motivations discussed in (edit: changed reference) Sec. 2 of here ad hoc and unmotivated, if you're already familiar with them (no worries if not)? (I would at least agree that rationalizing people's intuitive ambiguity aversion is ad hoc and unmotivated.)

I think this quote nicely summarizes the argument you're asking about:

Not only do we not have evidence of a kind that allows us to know the total consequences of our actions, we seem often to lack evidence of a kind that warrants assigning precise probabilities to relevant states.

This, I would say, sounds like a reasonable critique if one does not really get the idea of Bayesianism. Like, if I put myself in a mindset where I'm only allowed to use probabilities when I have positive evidence which "warrants" those precise probabilities, then sure, it's a reasonable criticism. But a core idea of Bayesianism is that we use probabilities to represent our uncertainties even in the absence of evidence; that's exactly what a prior is. And the point of all the various arguments for Bayesian reasoning is that this is a sensible and consistent way to handle uncertainty, even when the available evidence is weak and we're mostly working off of priors.

As a concrete example, I think of Jaynes' discussion of the widget problem (pg 440 here): one is given some data on averages of a few variables, but not enough to back out the whole joint distribution of the variables from the data, and then various decision/inference problems are posed. This seems like exactly the sort of problem the quote is talking about. Jaynes' response to that problem is not "we lack evidence which warrants assigning precise probabilities", but rather, "we need to rely on priors, so what priors accurately represent our actual state of knowledge/ignorance?".

Point is: for a Bayesian, the point of probabilities is to accurately represent an agent's epistemic state. Whether the probabilities are "warranted by evidence" is a nonsequitur.

we need to rely on priors, so what priors accurately represent our actual state of knowledge/ignorance?

Exactly — and I don't see how this is in tension with imprecision. The motivation for imprecision is that no single prior seems to accurately represent our actual state of knowledge/ignorance.

What use case are you intending these for? Any given use of probabilities I think depends on what you're trying to do with them, and how long it makes sense to spend fleshing them out.

Predicting the long-term future, mostly. (I think imprecise probabilities might be relevant more broadly, though, as an epistemic foundation.)

An alternative to always having a precise distribution over outcomes is imprecise probabilities: You represent your beliefs with a

setof distributions you find plausible.And if you have imprecise probabilities, expected value maximization isn't well-defined. One natural generalization of EV maximization to the imprecise case is maximality:

^{[1]}You prefer A to B iff EV_p(A) > EV_p(B) with respect to every distribution p in your set. (You're permitted to choose any option that you don't disprefer to something else.)If you don’t endorse either (1) imprecise probabilities or (2) maximality given imprecise probabilities, I’m interested to hear why.

^{^}I think originally due to Sen (1970); just linking Mogensen (2020) instead because it's non-paywalled and easier to find discussion of Maximality there.