Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This Sunday join a watch-party for Vanessa Kosoy's discussion of Infra-Bayesianism on the AI X-Risk Research Podcast (AXRP), then discuss it with other LessWrongers.

The first 30 mins of the event will be for focused listening / reading of the transcript, and the subsequent 90 mins will be on discussion. If you'd like to just show up for the discussion, you're welcome to show up half an hour in.

Vanessa Kosoy is an AI Alignment researcher supported by MIRI and the LTFF. She is based in Israel and writes regularly on LessWrong / the AI Alignment Forum.

Infra-Bayesianism is approach to understanding foundational questions of agency, and has bearing on the AI alignment problem.

MIRI writes of her work:

Vanessa Kosoy and Alex Appel’s infra-Bayesianism is a novel framework for modeling reasoning in cases where the reasoner’s hypothesis space may not include the true environment.

This framework is interesting primarily because it seems applicable to such a wide variety of problems: non-realizability, decision theory, anthropics, embedded agency, reflection, and the synthesis of induction/probability with deduction/logic. Vanessa describes infra-Bayesianism as “opening the way towards applying learning theory to many problems which previously seemed incompatible with it.”

We're meeting at this Zoom link at noon (PDT) this Sunday:

New Comment
45 comments, sorted by Click to highlight new comments since:

Here are some questions and confusions we had during the event.

There are a few equivalent ways to view infra-distributions:

  • single infra-distribution
  • mixture of infra-distributions
  • concave functional

So far, only the 'mixture of infra-distributions' view really makes sense to me in my head. Like, I don't know how else I'd design/learn an infra-distribution. So that's a limitation of my understanding.

The exact same thing is true for classical probability theory: you have distributions, mixtures of distributions and linear functionals respectively. So I'm not sure what new difficulty comes from infra-Bayesianism?

Maybe it would help thinking about infra-MDPs and infra-POMDPs?

Also, here I wrote about how you could construct an infra-Bayesian version of the Solomonoff prior, although possibly it's better to do it using infra-Bayesian logic.

"Mixture of infra-distributions" as in convex set, or something else? If it's something else then I'm not sure how to think about it properly.

"mixture of infradistributions" is just an infradistribution, much like how a mixture of probability distributions is a probability distribution.

Let's say we've got a prior , a probability distribution over indexed hypotheses.

If you're working in a vector space, you can take any countable collection of sets in said vector space, and mix them together according to a prior  giving a weight to each set. Just make the set of all points which can be made by the process "pick a point from each set, and mix the points together according to the probability distribution "

For infradistributions as sets of probability distributions or a-measures or whatever, that's a subset of a vector space. So you have a bunch of sets , and you just mix the sets together according to , that gives you your set .

If you want to think about the mixture in the concave functional view, it's even nicer. You have a bunch of  which are "hypothesis i can take a function and output what its worst-case expectation value is". The mixture of these, , is simply defined as . This is just mixing the functions together!

Both of these ways of thinking of mixtures of infradistributions are equivalent, and recover mixture of probability distributions as a special case.

+1, especially the concave functional view

The concave functional view is "the thing you do with a probability distribution is take expectations of functions with it. In fact, it's actually possible to identify a probability distribution with the function  mapping a function to its expectation. Similarly, the thing we do with an infradistribution is taking expectations of functions with it. Let's just look at the behavior of the function  we get, and neglect the view of everything as a set of a-measures."

As it turns out, this view makes proofs a whole lot cleaner and tidier, and you only need a few conditions on a function like that for it to have a corresponding set of a-measures.

The Nirvana trick seems like a cheap hack, and I'm curious if there's a way to see it as good reasoning.

One response to this was that predicting Nirvana in some circumstance is equivalent to predicting that there are no possible futures in that circumstance, which is a sensible thing to say as a prediction that that circumstance is impossible.

Notably, these are equivalent in the context of our 'expectations' being infima - if we were doing a mixture rather than taking worst-case bounds, these would not be equivalent (or rather, I don't know what it would mean to take expectations over a circumstance that didn't have any possible worlds)

There is a formal sense in which "predicting Nirvana in some circumstance is equivalent to predicting that there are no possible futures in that circumstance", see our latest post. It's similar to MUDT, where, if you prove a contradiction then you can prove utility is as high as you like.

Google doc where we posted our confusions/thoughts earlier:

My ongoing confusions/thoughts:

  • What if the super intelligent deity is less than maximally evil or maximally good? (E.g. the deity picking the median-performance world)
  • What about the dutch-bookability of infraBayesians? (the classical dutch-book arguments seem to suggest pretty strongly that non-classical-Bayesians can be arbitrarily exploited for resources)
  • Is there a meaningful metaphysical interpretation of infraBayesianism that does not involve Murphy? (similarly to how Bayesianism can be metaphysically viewed as "there's a real, static world out there, but I'm probabilistically unsure about it")

What if the super intelligent deity is less than maximally evil or maximally good? (E.g. the deity picking the median-performance world)

Thinking of the worst-case is just a mathematical reflection of the fact we want to be able to prove lower bounds on the expected utility of our agents. We have an unpublished theorem that, in some sense, any such lower bound guarantee has an infra-Bayesian formulation.

Another way to justify it is the infra-Bayesian CCT (see "Complete Class Theorem Weak Version" here).

What about the dutch-bookability of infraBayesians? (the classical dutch-book arguments seem to suggest pretty strongly that non-classical-Bayesians can be arbitrarily exploited for resources)

I think it might depend on the specific Dutch book argument, but one way infra-Bayesians escape them is by... being equivalent to certain Bayesians! For example, consider the setting where your agent has access to random bits that the environment can't predict. Then, infra-Bayesian behavior is just the Nash equilibrium in a two-player zero-sum game (against Murphy). Now, the Nash strategy in such a game is the (Bayes) optimal response to the Nash strategy of the other player, so it can be regarded as "Bayesian". However, the converse is false: not every best response to Nash is in itself Nash. So, the infra-Bayesian decision rule is more restrictive than the corresponding Bayesian decision rule, but it's a special case of the latter.

Is there a meaningful metaphysical interpretation of infraBayesianism that does not involve Murphy? (similarly to how Bayesianism can be metaphysically viewed as "there's a real, static world out there, but I'm probabilistically unsure about it")

I think of it as just another way of organizing uncertainty. The question is too broad for a succinct answer, I think, but here's one POV you could take: Let's remember the frequentist definition of probability distributions as time limits of frequencies. Now, what if the time limit doesn't converge? Then, we can make a (crisp) infradistribution instead: the convex hull of all limit points. Classical frequentism also has the problem that the exact same event never repeats itself. But in "infra-frequentism" we can solve this: you don't need the exact same event to repeat, you can draw the boundary around what counts as "the event" any way you like.

Once we go from passive observation to active interaction with the environment, your own behavior serves as another source of Knightian uncertainty. That is, you're modeling the world in terms of certain features while ignoring everything else, but the state of everything else depends on your past behavior (and you don't want to explicitly keep track of that). This line of thought can be formalized in the language of infra-MDPs (unpublished). And then ofc you complement this "aleatoric" uncertainty with "epistemic" uncertainty by considering the mixture of many infra-Bayesian hypotheses.

I understand that Infra-Bayesianism wants to be able to talk about hypotheses that do not describe the entire environment. (Like logical induction.) Something that just says “I think this particular variable is going to go up, but I don’t know how the rest of the world works.”

To do this, somehow magically using intervals over probabilities helps us. I understand it's trying to define a measure over multiple probability distribution, but I don't know quite how that maps to these convex sets, and would be interested in the basic relationship being outlined, or a link to the section that does it. (The 8 posts of math were scary and I didn't read them.)

A convex set is like a generalization of an interval.

More specifically: if two points are in a convex set, then the entire line segment connecting them must also be in the set.

So just like an interval says "any probability in this interval might pan out", the sets are saying "I want to be able to deal with any probability distribution in this set". And the sets happen to be convex. I don't think you need to know what 'convex' means to understand the podcast episode, but I tried to give a good explanation here:

I'm still trying to wrap my head around how the update rule deals with hypotheses (a-measures) that have very low expected utility. In order for them to eventually stop dominating calculations, presumably their affine term has to get lifted as evidence goes against them?

Edit: I guess I'm real confused about the function called "g" in basic inframeasure theory. I think that compactness (mentioned... somewhere) forces different hypotheses to be different within some finite time. But I don't understand the motivations for different g.

Ah. So, low expected utility alone isn't too much of a problem. The amount of weight a hypothesis has in a prior after updating depends on the gap between the best-case values and worst-case values. Ie, "how much does it matter what happens here". So, the stuff that withers in the prior as you update are the hypotheses that are like "what happens now has negligible impact on improving the worst-case". So, hypotheses that are like "you are screwed no matter what" just drop out completely, as if it doesn't matter what you do, you might as well pick actions that optimize the other hypotheses that aren't quite so despondent about the world.

In particular, if all the probability distributions in a set are like "this thing that just happened was improbable", the hypothesis takes a big hit in the posterior, as all the a-measures are like "ok, we're in a low-measure situation now, what happens after this point has negligible impact on utility". 

I still need to better understand how updating affects hypotheses which are a big set of probability distributions so there's always one probability distribution that's like "I correctly called it!".

The motivations for different g are: 

If g is your actual utility function, then updating with g as your off-event utility function grants you dynamic consistency. Past-you never regrets turning over the reins to future you, and you act just as UDT would.

If g is the constant-1 function, then that corresponds to updates where you don't care at all what happens off-history (the closest thing to normal updates), and both the "diagonalize against knowing your own action" behavior in decision theory and the Nirvana trick pops out for free from using this update.

I'm curious what other motivating examples there are for getting a grasp of this infra-Bayesianism stuff. The example with computable even bits and uncomputable/random odd bits was really helpful for intuition building, and so was the Nirvana example. But I still don't really get what's up with the whole convex set thing, or what sa-measures are, or why it's okay to include an off-history term into your belief update rule. 

FWIW the first post in the infra-Bayes sequence has an example that I think gives you a clue why you need to include off-history terms into your belief update rule.

Yeah, I'm planning to read in detail after we're done here. 

Me too. I currently only have a very superficial understanding of infraBayesianism (all of which revolves around the metaphysical, yet metaphorical, deity Murphy).

I am quite interested to get a first-person sense of what it feels like from the inside to be an Infra-Bayesian. In particular, is there a cognitive thing I already do, or should try, that maps to this process for dealing with having measure over different probability distributions?

I think that if you imagine the deity Murphy trying to foil your plans whatever you do, that gives you a pretty decent approximation to true infraBayesianism.

Confusion about what Solomonoff priors can’t do:

  • “Even bits are all zero, odd bits are random”: The Turing machine that writes zero to all even bits and writes some hardcoded string to all odd bits is simpler than the Turing machine that writes one long hardcoded string, so it seems to me that the Solomonoff prior should learn that the even bits are all zero
    • The discussion there seemed to bleed into "what if the string of odd bits is uncomputable", which I think of as a separate field of confusion, so I'm still confused what intuition this example is supposed to be pumping exactly.
  • “Uncomputable priors”: The simplest uncomputable prior I can think of would be “the nth bit is 1 iff the nth Turing machine halts”. But the Turing machine that tries to runs the nth Turing machine for 10^10 steps and writes 1 if it halts, and otherwise writes 0 unless n is in some hardcoded list is reasonably simple, so it seems to me that the Solomonoff prior should learn this kind of thing to a reasonable degree
    • This works finitely long but eventually the Solomonoff prior won't be able to be confident in what the next bit is. But to me it's not obvious how we could do better than that, given that this is inherently computationally expensive
  • Priors like “Omega predicts my action”: I have no idea what a solomonoff prior does, but I also have no idea what infra-Bayesianism does. Specifically, I'm not sure if there's some specific way that infra-Bayesianism learns this hypothesis (and whether it can infer it from observations or whether you have to listen to Omega telling you that they predict your action)

Re point 1, 2: Check this out. For the specific case of 0 to even bits, ??? to odd bits, I think solomonoff can probably get that, but not more general relations.

Re: point 3, Solomonoff is about stochastic environments that just take your action as an input, and aren't reading your policy. For infra-Bayes, you can deal with policy-dependent environments without issue, as you can consider hard-coding in every possible policy to get a family of stochastic environments, and UDT behavior naturally falls out as a result from this encoding. There's still some open work to be done on which sorts of policy-dependent environments like this are learnable (inferrable from observations), but it's pretty straightforward to cram all sorts of weird decision-theory scenarios in as infra-Bayes hypothesis, and do the right thing in them.

Specifically, I'm not sure if there's some specific way that infra-Bayesianism learns this hypothesis

Well you had the misfortune to listen to a podcast where I was asking the questions, and I didn't understand infra-Bayesian learning theory and was too afraid to ask.

But to me it's not obvious how we could do better than that, given that this is inherently computationally expensive.

If the even bits are computable and the odd bits aren't, the whole sequence isn't computable so Solomonoff (plausibly) fails. You might hope that even if you can't succeed at predicting the odd bits, you could still succeed at predicting the even bits (which on their own are eminently predictable).

I'm not sure what exactly you mean by "fails" here, but I'm pretty sure the Solomonoff prior should be fine at predicting the even bits (in the sense that once you reveal a large number of bits of the sequence, it is overwhelmingly likely that that the Solomonoff prior will assign a very high probability that the next even bit is a zero).

Am I simply wrong about how the Solomonoff prior works, or do I just have a lower standard for "success" or "failure" here?

I think you are wrong to think that it's overwhelmingly likely that Solomonoff will predict the even bits well.

Oof sorry for the delay!

Yes it looks like that's it. I didn't realize that once you hardcoded all the odd bits as some list L, the hypothesis "all even bits are 0 and the odd bits are L and then all 1s" isn't actually much simpler than the hypothesis "the even bits are length(L) 0s and then all 1s, the odd bits are L and then all 1s".

With this confusion out of the way, I'll try to dig deeper into the sequences and then report back what infra-Bayesianism does about this...

Confusion that was (partially, maybe?) resolved: how exactly does infra-Bayesianism help with realizability. 

Since hypotheses in Bayesian-sim are convex sets over probability distributions, hypotheses can cover non-computable cases, even though each individual probability distribution is computable. For example, you might be able to have a computable hypothesis that all the even bits are 0s, which covers the case where the odd bits are uncomputable, even though no computable probability distribution in the hypotheses can predict the odd bits (and will thus eventually be falsified). 

I'm not a mathematician, so it all remains very abstract for me. I'm curious if someone could explain it like I'm five. Is there some useful, concrete application to illustrate the theory? 

Here's an ELI5: The evil superintelligent deity Murphy, before you were ever conceived, picked the worst possible world that you could live in (meaning the world where your performance is worst), and you have to use fancy math tricks to deal with that.

@Diffractor: I think I got a MIRIxDiscord invite in a way somehow related to this event. Check your PMs for details. (I'm just commenting here to get attention because I think this might be mildly important.)

Bumping this.

Bumping it again.

It would be great if you could post a google calendar link (maybe for the next event). That would make it a lot easier to figure time zone issues (I almost messed up due to our switch to summer time on March 28).

...and it's closed.

Oh woops, I realize I ended the call for everyone when I left. I'm sorry.

Don't worry, it was kind of a natural stopping point anyways, as the discussion was winding down.

Note that the episode is over an hour long, so you'll need to listen at faster than 2x speed to listen to the whole thing in 30 minutes.

My guess is that if you listen at 2x speed for 30 mins you'll be well-placed to participate in discussion about it.