Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This is a linkpost for https://universalprior.substack.com/p/elementary-infra-bayesianism

Elementary Infra-Bayesianism

7Algon

1Jan

1Keenan Pepper

New Comment

3 comments, sorted by Click to highlight new comments since: Today at 9:32 AM

I upvoted because distillations are important, but a first pass over your post left me much more confused than I was before. Another issue was the level of formality. Either use less, and stick to core intuitions, or use more. Personally, I would have liked to know the types of all the objects being mentioned, and a mention of what space X was (I'm assuming the space of possible events?), and some explanation of how relates to or even g. In fact, I can't see how g relates to any of the objects.

EDIT: I meant to say, you didn't say how g relates to any of the distributions or update rules or whatever except via an unclear analogy. Though note, I'm pretty tired right now so take my confusion as not indicative of the average reader.

Thank you for your comment! You are right, these things are not clear from this post at all and I did not do a good job at clarifying that. I'm a bit low on time atm, but hopefully, I'll be able to make some edits to the post to set the expectations for the reader more carefully.

The short answer to your question is: Yep, X is the space of events. In Vanessa's post it has to be compact and metric, I'm simplifying this to an interval in R. And can be derived from by plugging in g=0 and replacing the measure by the Lesbegue integral . I have scattered notes where I derive the equations in this post. But it was clear to me that if I want to do this rigorously in the post, then I'd have to introduce an annoying amount of measure theory and the post would turn into a slog. So I decided to do things hand-wavy, but went a bit too hard in that direction.

TL;DR: I got nerd-sniped into working through some rather technical work in AI Safety. Here's my best guess of what is going on. Imprecise probabilities for handling catastrophic downside risk.Short summary: I apply the updating equation from Infra-Bayesianism to a concrete example of an Infradistribution and illustrate the process. When we "care" a lot for things that are unlikely given what we've observed before, we get updates that are extremely sensitive to outliers.I've

written previouslyon how to act when confronted with something smarter than yourself. When in such a precarious situation, it is difficult to trust “the other”; they might dispense their wisdom in a way that steers you to their benefit.In general, we're screwed.But there are ideas for a constrained set-up that

forces “the other” to explain itself and point out potential flaws in its arguments. We might thus leverage “the other”'s ingenuity against itself by slowing down its reasoning to our pace. “The other” would no longer be an oracle withprophecies that might or might not kill usbut instead a teacher who lets us see things we otherwise couldn't.While that idea is nice, there is a severe flaw at its core:

obfuscation. By making the argument sufficiently long and complicated, “the other” can sneak a false conclusion past our defenses. Forcing “the other” to lay out its reasoning, thus, is not a foolproof solution. But (as some have argued), it's unclear whether this will be a problem in practice.Why am I bringing this up? No reason in particular.

Why Infra-Bayesianism?Engaging with the work of Vanessa Kosoy is a rite of passage in the AI Safety space. Why is that?

andenjoyable. It's okay to have fun with it.But being complicated is (in itself) not a mark of quality.

If you can't explain it, you don't understand it. So here goes my attempt at "Elementary Infrabayesianism", where I motivate a portion of Infrabayesianism using pretty pictures and high school mathematics^{[1]}.## Uncertain updates

Imagine it's late in the night, the lights are off, and you are trying to find your smartphone. You cannot turn on the lights, and you are having a bit of trouble seeing properly

^{[2]}. You have a vague sense about where your smartphone should be (your prior, panela). Then you see a red blinking light from your smartphone (sensory evidence, panelb). Since your brain is really good at this type of thing, you integrate the sensory evidence with your prior optimally (despite your disinhibited state) to obtain an improved sense of where your smartphone might be (posterior, panelc).Now let's say you are even more uncertain about where you put your smartphone.

^{[3]}It might be one end of the room or the other (bimodal prior, panela). You see a blinking light further to the right (sensory evidence, panelb), so your overall belief shifts to the right (bimodal posterior, panelc). Importantly, by conserving probability mass, your belief that the phone might be on the left end of the room is reduced. The absence of evidence is evidence of absence.## Fundamentally uncertain updates

Let's say you are

really, fundamentallyunsure about where you put your phone. If someone were to~~put a gun to your head~~threaten to sign you up for sweaters for kittens unless you give them your best guess, you could not.^{[4]}This is the situation Vanessa Kosoy finds herself in

^{[5]}.^{[6]}With Infra-Bayesianism, she proposes a theoretical framework for thinking in situations where you can't (or don't want to) specify a prior on your hypotheses. Because she is a mathematician, she is using the proper terminology for this:signed measureis a generalization of probability distributions,indicator function for a fuzzy setis a generalization of your observation/sensory evidence,g tells you how much you care about stuff that happens in regions that become very unlikely/impossible given the sensory evidence you obtain. Why should you care about that, you ask? Great question, let's just not care about it for now. Let's set it equal to zero, g=0.

When g=0, the updating equation for our two priors, P1 and P2, becomes very familiar indeed:

This is basically Bayes' theorem applied to each prior separately. Still, the evidence term (the denominator) is computed in a wonky way

^{[7]}but this doesn't make much difference since it's a shared scaling factor. Consistently, things also look very normal when using this updating rule to integrate sensory information. We shift our two priors towards the evidence and scale them proportional to how unlikely they said the evidence is.## Fundamentally

dangerousupdatesAlright, you know where this is going. We will have to start caring about things that become less likely after observing the evidence. Why we have to care is a bit hard to motivate; Vanessa Kossoy and Diffractor motivate in three parts where I don't even get the first part

^{[8]}.^{[9]}Instead, I will motivate why you might care about things that seem very unlikely given your evidence by revealing more information about the thought experiment:

It's not so much that you

can'tgive your best guess estimate about where you put your smartphone. Rather, youdarenot. Getting this wrong would be, like,really bad. You might be unsure whether it's even your phone that's blinking or if it's the phone of the other person sleeping in the room^{[10]}. Or perhaps the bright red light you see is the bulbous red nose of somebody else sleeping in the room. Getting the location of your smartphone wrong would be messy. Better not risk it. We'll set g=1.The update rule doesn't change too much at first glance:

Again, the denominator changes from one wonky thing (P+) to another wonky thing (P−);

^{[11]}but that still doesn't matter, since it's the same for both equations.And, of course, then there is a ϰ that showed up out of nowhere. ϰ is a variable that tells us how good our distribution is at

explaining things that we did not get any evidence for^{[12]}. Intuitively, you can tell that this will favor the prior distribution that was previously punished for not explaining the observation. And indeed, when we run the simulation:One of the two "distributions"

^{[13]}is taking off! Even though the corresponding prior was bad at explaining the observation, the updating stillstronglyincreases the mass associated with that hypothesis.Intuitively this translates into something like:

This is a

verycautious strategy, and it might be appropriate when you're in dangerous domains with the potential for catastrophic outliers, basically what Nassim Taleb calls Black Swan events. I'm not sure howproductivethis strategy is, though; noise might dramatically mess up your updates at some point.## Closing thoughts

This concludes the introduction to Elementary Infrabayesianism. I realize that I have only scratched the surface of what's in the sequence, and there is more coming out every other month, but letting yourself get nerd-sniped is just about as important as being able to stop working on something and publish. I hope what I wrote here is helpful to some, in particular in conjunction with the other explanations on the topic (1 2 3) which go a bit further than I do in this post.

I'm afraid at this point I'm obliged to add a hot take on what all of this means for AI Safety. I'm not sure. I can tell myself a story about how being

very carefulabout how quickly you discard alternative hypotheses/narrow down the hypothesis space is important. I can also see the outline of how this framework ties in with fancy decision theory. But I still feel like I only scratched the surface of what's there. I'd really like to get a better grasp of that Nirvana trick, but timelines are short and there is a lot out there to explore.^{^}French high school though, not American high school.

^{^}If there's been alcohol involved, I want to know nothing of it.

^{^}The idea that alcohol might have been involved in navigating you into this situation is getting harder to deny.

^{^}Is this ever a reasonable assumption? I don't know. It seems to me you can always just pick an uninformative prior. But perhaps the point is that sometimes you

shouldacknowledge your cluelessness, otherwise you expose yourself to severe downside risks? But I'm not convinced.^{^}Not the coming home drunk situation, only the fundamental confused part. Oh no, that came out wrong. What I mean is that she is trying to become

lessfundamentally confused. Urgh. I'll just stop digging now.^{^}A proper infradistribution would have to be a convex set of distributions and upper complete and everything. Also, the support of the Gaussians would have to be compact. But for the example I'm constructing this won't become relevant, the edge points (the two Gaussians) of the convex set fully characterize how the entire convex set changes.

^{^}PHg(L)=EH(L)=minp∈{p1,p2}∫RL(x)p(x)dx rather than ∫Rp1(x)+p2(x)2L(x)dx for an uninformative prior.

^{^}Despite having read it at least twice!

^{^}A more "natural" way to motivate it might be to talk about possible worlds and updateless decision theory, but this is something that you apparently get

outof Infrabayesianism, so we don't want to use it to motivate it.^{^}The story is coming together. This is why you can't turn on the light, btw.

^{^}Actually, in this particular example, it turns out that P+=P−,

PHg(L)=EH(1)−EH(1−L)=1−minp∈{p1,p2}∫R(1−L(x))p(x)dx=minp∈{p1,p2}∫RL(x)p(x)dx, since we've got two normalized probability distributions.

^{^}You can't find any ϰ in Vanessa Kosoy's paper because she is thinking more generally about Banach spaces and also a situation where there is no Radon-Nikodyn derivative. But if we have a density for our measures, we can write ϰ as ∫Xϰdm=b for an inframeasure (m,b).

Also, you can't find ϰ basically nowhere because almost nobody uses it!

^{^}I'm still calling them distributions, although we've left that territory already in the last section. More appropriate would be something like "density function of the signed measure" or "Radon-Nikodym derivative".