Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Reflective Bayesianism

15Ben Pace

13DanielFilan

4DanielFilan

4abramdemski

2abramdemski

2DanielFilan

12Bunthut

2abramdemski

1Bunthut

13abramdemski

3Bunthut

2abramdemski

11DanielFilan

4abramdemski

7DanielFilan

2abramdemski

4DanielFilan

4abramdemski

1TekhneMakre

1TAG

5DanielFilan

1TAG

5DanielFilan

3TAG

2Pattern

2abramdemski

2Pattern

New Comment

27 comments, sorted by Click to highlight new comments since: Today at 8:17 AM

What is actually left of Bayesianism after Radical Probabilism? Your original post on it was partially explaining logical induction, and introduced assumptions from that in much the same way as you describe here. But without that, there doesn't seem to be a whole lot there. The idea is that all that matters is resistance to dutch books, and for a dutch book to be fair the bookie must not have an epistemic advantage over the agent. Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of this would require solving epistemology in general. So we avoid this problem by instead taking "what the agent actually knew (believed) at the time", which is a subset and so also fair. But this doesn't do any work, it just offloads it to agent design.

For example with logical induction, we know that it can't be dutch booked by any polynomial-time trader. Why do we think that criterion is important? Because we think its realistic for an agent to in the limit know anything you can figure out in polynomial time. And we think that because we have an algorithm that does it. Ok, but what intellectual progress does the dutch book argument make here? We had to first find out what one can realistically know, and got logical induction, from which we could make the poly-time criterion. So now we know its fair to judge agents by that criterion, so we should find one, which fortunately we already have. But we could also just not have thought about dutch books at all, and just tried to figure out what one could realistically know, and what would we have lost? Making the dutch book here seems like a spandrel in thinking style.

As a side note, I reread Radical Probabilism for this, and everything in the "Other Rationality Properties" section seems pretty shaky to me. Both the proofs of both convergence and calibration as written depend on logical induction - or else, the assumption that the agent would know if its not convergent/calibrated, in which case could orthodoxy not achieve the same? You acknowledge this for convergence in a comment but also hint at another proof. But if radical probabilism is a generalization of orthodox bayesianism, then how can it have guarantees that the latter doesn't?

For the conservation of expected evidence, note that the proof here involves a bet on what the agents future beliefs will be. This is a fragile construction: you need to make sure the agent can't troll the bookie, without assuming the accessability of the structures you want to establish. It also assumes the agent has models of itself in its hypothesis space. And even in the weaker forms, the result seems unrealistic. There is the problem with psychedelics that the "virtuous epistemic process" is supposed to address, but this is something that the formalism allows for with a free parameter, not something it solves. The radical probabilist trusts the sequence of , but it doesn't say anything about where they come from. You can now assert that it can't be identified with particular physical processes, but that just leaves a big questionmark for bridging laws. If you want to check if there are dutch books against your virtuous epistemic process, you have to be able to identify its future members. Now I can't exclude that some process could avoid all dutch books against it without knowing where they are (and without being some trivial stupidity), but it seems like a pretty heavy demand.

What is actually left of Bayesianism after Radical Probabilism? Your original post on it was partially explaining logical induction, and introduced assumptions from that in much the same way as you describe here. But without that, there doesn't seem to be a whole lot there. The idea is that all that matters is resistance to dutch books, and for a dutch book to be fair the bookie must not have an epistemic advantage over the agent. Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of this would require solving epistemology in general. So we avoid this problem by instead taking "what the agent actually knew (believed) at the time", which is a subset and so also fair. But this doesn't do any work, it just offloads it to agent design.

Part of the problem is that I avoided getting too technical in *Radical Probabilism*, so I bounced back and forth between different possible versions of Radical Probabilism without too much signposting.

I can distinguish at least three versions:

- Jeffrey's version. I don't have a good source for his full picture. I get the sense that the answer to "what is left?" is "very little!" -- EG, he didn't think agents have to be able to articulate probabilities. But I am not sure of the details.
- The simplification of Jeffrey's version, where I keep the Kolmogorov axioms (or the Jeffrey-Bolker axioms) but reject Bayesian updates.
- Skyrms' deliberation dynamics. This is a pretty cool framework and I recommend checking it out (perhaps via his book
*The Dynamics of Rational Deliberation*). The basic idea of its non-bayesian updates is, it's fine so long as you're "improving" (moving towards something good). - The version represented by logical induction.
- The Shafer & Vovk version. I'm not really familiar with this version, but I hear it's pretty good.

(I can think of more, but I cut myself off.)

Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of this would require solving epistemology in general.

Making a broad generalization, I'm going to stick things into camp #2 above or camp #4. Theories in camp #2 have the feature that they simply *assume* a solid notion of "what the agent could have known at the time". This allows for a nice simple picture in which we can check Dutch Book arguments. However, it does lend itself more easily to logical omniscience, since it doesn't allow a nuanced picture of how much logical information the agent can generate. Camp #4 means we do give such a nuanced picture, such as the poly-time assumption.

Either way, we've made assumptions which tell us which Dutch Books are valid. We can then check what follows.

For example with logical induction, we know that it can't be dutch booked by any polynomial-time trader. Why do we think that criterion is important? Because we think its realistic for an agent to in the limit know anything you can figure out in polynomial time. And we think that because we have an algorithm that does it. Ok, but what intellectual progress does the dutch book argument make here? We had to first find out what one can realistically know, and got logical induction, from which we could make the poly-time criterion. So now we know its fair to judge agents by that criterion, so we should find one, which fortunately we already have. But we could also just not have thought about dutch books at all, and just tried to figure out what one could realistically know, and what would we have lost? Making the dutch book here seems like a spandrel in thinking style.

I think this understates the importance of the Dutch-book idea to the actual construction of the logical induction algorithm. The criterion came first, and the construction was finished soon after. So the hard part was the criterion (which is conceived in dutch-book terms). And then the construction follows nicely from the idea of avoiding these dutch-books.

Plus, logical induction without the criterion would be much less interesting. The criterion implies all sorts of nice properties. Without the criterion, we could point to all the nice properties the logical induction algorithm has, but it would just be a disorganized mess of properties. Someone would be right to ask if there's an underlying reason for all these nice properties -- an organizing principle, rather than just a list of seemingly nice properties. The answer to that question would be "dutch books".

BTW, I believe philosophers currently look down on dutch books for being too pragmatic/adversarial a justification, and favor newer approaches which justify epistemics from a plain *desire to be correct* rather than a desire to not be exploitable. So by no means should we assume that Dutch Books are the only way. However, I personally feel that logical induction is strong evidence that Dutch Books are an important organizing principle.

As a side note, I reread Radical Probabilism for this, and everything in the "Other Rationality Properties" section seems pretty shaky to me. Both the proofs of both convergence and calibration as written depend on logical induction - or else, the assumption that the agent would know if its not convergent/calibrated, in which case could orthodoxy not achieve the same? You acknowledge this for convergence in a comment but also hint at another proof. But if radical probabilism is a generalization of orthodox bayesianism, then how can it have guarantees that the latter doesn't?

You're right to call out the contradiction between calling radical probabilism a generalization, vs claiming that it implies new restrictions. I should have been more consistent about that. Radical Probabilism is merely "mostly a generalization".

I still haven't learned about how #2-style settings deal with calibration and convergence, so I can't really comment on the other proofs I implied the existence of. But, yeah, it means there are extra rationality conditions beyond just the Kolmogorov axioms.

For the conservation of expected evidence, note that the proof here involves a bet on what the agents future beliefs will be. This is a fragile construction: you need to make sure the agent can't troll the bookie, without assuming the accessability of the structures you want to establish. It also assumes the agent has models of itself in its hypothesis space. And even in the weaker forms, the result seems unrealistic. There is the problem with psychedelics that the "virtuous epistemic process" is supposed to address, but this is something that the formalism allows for with a free parameter, not something it solves. The radical probabilist trusts the sequence of , but it doesn't say anything about where they come from. You can now assert that it can't be identified with particular physical processes, but that just leaves a big questionmark for bridging laws. If you want to check if there are dutch books against your virtuous epistemic process, you have to be able to identify its future members. Now I can't exclude that some process could avoid all dutch books against it without knowing where they are (and without being some trivial stupidity), but it seems like a pretty heavy demand.

This part seems entirely addressed by logical induction, to me.

- A "virtuous epistemic process" is a logical inductor. We know logical inductors come to trust their future opinions (without knowing specifically what they will be).
- The logical induction algorithm tells us where the future beliefs come from.
- The logical induction algorithm shows how to have models of yourself.
- The logical induction algorithm shows how to avoid all dutch books "without knowing where they are" (actually I don't know what you meant by this)

Either way, we've made assumptions which tell us which Dutch Books are valid. We can then check what follows.

Ok. I suppose my point could then be made as "#2 type approaches aren't very useful, because they assume something thats no easier than what they provide".

I think this understates the importance of the Dutch-book idea to the actual construction of the logical induction algorithm.

Well, you certainly know more about that than me. Where did the criterion come from in your view?

This part seems entirely addressed by logical induction, to me.

Quite possibly. I wanted to separate what work is done by radicalizing probabilism in general, vs logical induction specifically. That said, I'm not sure logical inductors properly have beliefs about their own (in the de dicto sense) future beliefs. It doesn't know "its" source code (though it knows that such code is a possible program) or even that it is being run with the full intuitive meaning of that, so it has no way of doing that. Rather, it would at some point think about the source code that we know is its, and come to believe that that program gives reliable results - but only in the same way in which it comes to trust other logical inductors. It seems like a version of this in the logical setting.

By "knowing where they are", I mean strategies that avoid getting dutch-booked without doing anything that looks like "looking for dutch books against me". One example of that would be The Process That Believes Everything Is Independent And Therefore Never Updates, but thats a trivial stupidity.

I wanted to separate what work is done by radicalizing probabilism in general, vs logical induction specifically.

From my perspective, Radical Probabilism is a gateway drug. Explaining logical induction intuitively is hard. Radical Probabilism is easier to explain and motivate. It gives reason to believe that there's something interesting in the direction. But, as I've stated before, I have trouble comprehending how Jeffrey *correctly predicted* that there's something interesting here, without logical uncertainty as a motivation. In hindsight, I feel his arguments make a great deal of sense; but without the reward of logical induction waiting at the end of the path, to me this seems like a weird path to decide to go down.

That said, we can try and figure out Jeffrey's perspective, or, possible perspectives Jeffrey could have had. One point is that he probably thought virtual evidence was extremely useful, and needed to get people to open up to the idea of non-bayesian updates for that reason. I think it's very possible that he understood his Radical Probabilism *purely* as a generalization of regular Bayesianism; he may not have recognized the arguments for convergence and other properties. Or, seeing those arguments, he may have replied "those arguments have a similar force for a dogmatic probabilist, too; they're just harder to satisfy in that case."

That said, I'm not sure logical inductors properly have beliefs about their own (in the de dicto sense) future beliefs. It doesn't know "its" source code (though it knows that such code is a possible program) or even that it is being run with the full intuitive meaning of that, so it has no way of doing that.

I totally agree that there's a philosophical problem here. I've put some thought into it. However, I don't see that it's a real obstacle to ... provisionally ... moving forward. Generally I think of the logical inductor as *the well-defined mathematical entity* and the self-referential beliefs are *the logical statements which refer back to that mathematical entity* (with all the pros and cons which come from logic -- ie, yes, I'm aware that even if we think of the logical inductor as the mathematical entity, rather than the physical implementation, there are formal-semantics questions of whether it's "really referring to itself"; but it seems quite fine to provisionally set those questions aside).

So, while I agree, I really don't think it's cruxy.

From my perspective, Radical Probabilism is a gateway drug.

This post seemed to be praising the virtue of returning to the lower-assumption state. So I argued that in the example given, it took more than knocking out assumptions to get the benefit.

So, while I agree, I really don't think it's cruxy.

It wasn't meant to be. I agree that logical inductors seem to de facto implement a Virtuous Epistemic Process, with attendent properties, whether or not they understand that. I just tend to bring up any interesting-seeming thoughts that are triggered during conversation and could perhaps do better at indicating that. Whether its fine to set it aside provisionally depends on where you want to go from here.

This post seemed to be praising the virtue of returning to the lower-assumption state. So I argued that in the example given, it took more than knocking out assumptions to get the benefit.

Agreed. Simple Bayes is the hero of the story in this post, but that's more because the simple bayesian *can recognize that there's something beyond.*

This reminds me strongly of Robin Hanson's pre-priors work. I guess the pre-prior has to do with the reflective belief, and replacing your prior over the average prior you could have been born with must be a tempting non-Bayesian update (assuming the framework makes any sense which I'm not sure it does).

But there is no

necessary lawsaying the universe must be mathematical, any more than there's a necessary law saying the universe has to be computational.

What would a non-mathematical universe look like that's remotely compatible with ours? I guess it would have to be that there are indescribable 'features' of the universe that are real and maybe even relevant to the describable features?

I guess I'm confused because in my head "mathematical" means "describable by a formal system", and I don't know how a thing could fail to be so describable.

I don't know what it would look like, but that isn't an argument that the universe *is* mathematical.

Frankly, I think there's something confused about the way I'm/we're talking about this, so I don't fully endorse what I'm saying here. But I'm going to carry on.

I guess I'm confused because in my head "mathematical" means "describable by a formal system", and I don't know how a thing could fail to be so describable.

So, the kind of thing I have in mind is the claim that reality is *precisely and completely* described by *some particular* mathematical object.

In my head, the argument goes roughly like this, with 'surely' to be read as 'c'mon I would be so confused if not':

- Surely there's some precise way the universe is.
- If there's some precise way the universe is, surely one could describe that way using a precise system that supports logical inference.

I guess it could fail if the system isn't 'mathematical', or something? Like I just realized that I needed to add 'supports logical inference' to make the argument support the conclusion.

So, let's suppose for a moment that ZFC set theory is the one true foundation of mathematics, and it has a "standard model" that we can meaningfully point at, and the question is whether our universe is somewhere in the standard model (or, rather, "perfectly described" by some element of the standard model, whatever that means).

In this case it's easy to imagine that the universe is actually some structure *not* in the standard model (such as the standard model itself, or the truth predicate for ZFC; something along those lines).

Now, granted,* the whole *point of moving from *some particular* system like that to the more general hypothesis "the universe is mathematical" is to capture such cases. However, the notion of "mathematics in general" or "described by some formal system" or whatever is sufficiently murky that there *could still be* an analogous problem -- EG, suppose there's a formal system which describes the entire activity of human mathematics. Then "the real universe" could be some object outside the domain of that formal system, EG, the truth predicate for that formal system, the intended 'standard model' of that system, etc.

I'm not *confident* that we should think that way, but it's a salient possibility.

>Surely there's some precise way the universe is.

Agree, and would love to see a more detailed explicit discussion of what this means and whether it's true. (Also, worth noting that there may be a precise way the universe is, but no "precise" way that "you" fit into the universe, because "you" aren't precise.)

I guess it would have to be that there are indescribable ‘features’ of the universe that are real and maybe even relevant to the describable features?

Eg. time (particularly passing-time), consciousness (particularly qualia). If you want to know what the potentially non-mathematical features are, look at how people argue against physicalism.

means “describable by a formal system”, and I don’t know how a thing could fail to be so describable.

Formally, some formal systems can fail to describe *themselves*.

Eg. time (particularly passing-time), consciousness (particularly qualia). If you want to know what the potentially non-mathematical features are, look at how people argue against physicalism.

I don't get why these wouldn't be mathematizable.

Formally, some formal systems can fail to describe

themselves.

Sure, but for every formal system, there's some formal system that describes it (right?)

Consequentialism is morally correct, but virtue ethics is what's most effective, and deontology is what the virtuous person would use.

Consequentialism is right because it's not about morality. (But also might be wrong, as a description, when people don't do things for a reason, like habit.)

B: Why do you play chess?

A: To have fun. And to beat you.

If this were true, thensimple beliefin consequentialism would implyreflective beliefin virtue ethics

Truth aside there's issues with the implication part. Will people reach the conclusion? There's a lot of math problems where the answer is a consequence of the properties of numbers. Does that mean you'll know the answer some time before you die? You might be able to pick out a given one where you will find out before you die if you take the time to solve it. Ethics though, doesn't seem to have the same guarantees, especially not around the correctness of general theories.

However, you can justifiably trust a probability distribution whose description includes running an accurate prime factorization algorithm.

That's not a probability distribution, that's a flowchart that terminates in "Yes" and "No".

Truth aside there's issues with the implication part. Will people reach the conclusion? There's a lot of math problems where the answer is a consequence of the properties of numbers. Does that mean you'll know the answer some time before you die? You might be able to pick out a given one where you will find out before you die if you take the time to solve it. Ethics though, doesn't seem to have the same guarantees, especially not around the correctness of general theories.

This is part of why there could be a lot of different formalizations of the simple/reflective distinction. Do you require only that an argument exists, or do you require that the agent recognizes the argument, or something in-between? (Example of a useful in-between definition: we require only that the argument *would be* recognized *if* it were made -- this is useful from the perspective of someone who cares mostly about whether an agent can be convinced.)

That's not a probability distribution, that's a flowchart that terminates in "Yes" and "No".

Properly, I should have made a distinction between probability distributions and *descriptions of *probability distributions. But the point is that an agent can prefer to run a program and use its output in place of the agent's own beliefs.

Do you require only that an argument exists, or do you require that the agent recognizes the argument, or something in-between?

The second one I think. The epiphany is sometimes characterized by frustration 'why didn't I think of that sooner?'

The optimal chess game (assuming it's unique) might proceed from the rules, but we might never know it. Even if I have the algorithm (say in pseudocode)

- If I don't have it in code, I might not run it
- If I have it code, but don't have the compute (or sufficiently efficient techniques) I might not find out what happens when I run it for long enough
- If I have the code, and the compute, then it's just a matter of running it.* But do I get around to it?

Understanding implication isn't usually as simple as I made it out to be above. People can work hard on a problem, and not find the answer for a lot of reasons - even if they have everything they need to know to solve it. Because they also have a lot of other information, and before they have the answer, they don't know what is, and what isn't relevant.

In other words, where implication is trivial and fast, reflection may be trivial and fast. If not...

The proof I never find does not move me.

*After getting the right version of the programming language downloaded, and working properly, just to do this one thing.

I've argued in several places that traditional Bayesian reasoning is unable to properly handle embeddedness, logical uncertainty, and related issues.

However, in the light of hindsight, it's possible to imagine a Bayesian pondering these things, and "escaping the trap". This is because the trap was largely made out of assumptions we didn't usually

explicitly recognizewe were making.Therefore, in order to aid moving beyond the traditional view, I think it's instructive to paint a detailed picture of what a traditional Bayesian might believe. This can be seen as a partner to my post Radical Probabilism, which explained in considerable detail how to move beyond the traditional Bayesian view.

## Simple Belief vs Reflective Belief

There is an important distinction between the explicit dogmas of a view (ie, what adherents explicitly endorse) vs what one would need to believe in order to agree with them. One can become familiar with this distinction by studying logic, especially Gödel's incompleteness theorems and Tarski's undefinability theorem. In particular:

This becomes especially interesting when we're studying rationality (rather than, say, mathematics), because a theory of rationality is supposed to characterize normatively correct thinking. Yet, due to the above facts, philosophers will

typicallyend up in a strange position where they endorse principles very different from the ones they themselves are using. For example, a philosopher arguing that set theory is the ultimate foundation of rational thought would find themselves in the strange position of utilizing "irrational" tools (tools which go beyond set theory) to come to its defense, EG in discussing the semantics of set theory or arguing for its consistency.1I'll use the term

to indicate accepting the explicit dogmas, andsimple beliefto indicate accepting the meta-dogmas which justify the dogmas.2,3 (These are not terms intended to stand the test of time; I'm not sure about the best terminology, and probably won't reliably use these particular terms outside of this blog post. Leverage Research, iiuc, uses the term "endorsed belief" for what I'm calling "reflective belief", and "belief" for what I'm calling "simple belief".)reflective beliefReflective belief and simple belief need not go together. There's a joke from Eliezer (h/t to Ben Pace for pointing me to the source):

If this were true, then

simple beliefin consequentialism would implyreflective beliefin virtue ethics (because you evaluate moral frameworks on their effects, not whether they're morally correct!). Similarly, simple belief in virtue ethics would imply reflective belief in deontology, and simple belief in deontology would imply reflective belief in consequentialism.So, not only does simple belief in X not imply reflective belief in X; furthermore, reflective belief in X need not imply simple belief in X! (This is, indeed, belief-in-belief.)

Hence, by "reflective belief" I do not necessarily mean "reflectively consistent belief".

occurs only when simple belief and reflective belief are one and the same: the reasoning system you use is also the one you endorse.Reflective consistency## Reflective Bayesianism

Applying the naive/reflective distinction to our case-in-point, I'll define two different types of Bayesian:

simple belief in Bayesianism. This describes an agent who reasons according to the laws of probability theory, and updates beliefs via Bayes' Law. This is the type of reasoner which BayesiansSimple Bayesian:study.reflective belief in Bayesianism. For simplicity, I'll assume thisReflective Bayesian:alsoinvolves simple belief in Bayesianism. Realistically, Bayesian philosophers can't reason in a perfectly Bayesian way; so, this is a simplified, idealized Bayesian philosopher.(A problem with the terminology in this section is that by "Bayesian" I specifically mean "dogmatic probabilism" in the terminology from my Radical Probabilism post. I don't want to construe "Bayesianism" to

necessarilyinclude Bayesian updates. The central belief of Bayesianism is subjective probability theory. However, repeating "simple dogmatic probabilism vs reflective dogmatic probabilism" over and over again in this essay was not very appealing.)Actually, there's not one canonical "reflective belief in Bayes" -- one can justify Bayesianism in many ways, so there can correspondingly be many reflective-Bayesian positions. I'm going to discuss a number of these positions.

## My prior is best.

The easiest way to reflectively endorse Bayesianism is to simply believe

my prior is best. No other distribution can have more information about the world, unless it actually observes something about the world.I think of this as

multiverse frequentism. You think your prior literally gives the frequency of different possible universes. Now, I'm not accusing anyone of really believing this, but Ihaveheard a (particularly reflective and self-critical) Bayesian articulate the idea. And I think a lot of people have an assumption like this in mind when they think about priors like the Solomonoff prior, which are designed to be particularly "objective". This is essentially a map/territory error.Some group of readers may think: "Now wait a minute. Shouldn't a Bayesian necessarily believe this? The expected Bayes loss of any other prior is going to be worse, when an agent considers them! Similarly, no other prior is going to look like a better tool for making decisions. So, yeah... I expect my prior to be best!"

On the other hand, those who endorse some level of modest epistemology might be giving that first group of readers some serious side-eye. Surely it's crazy to think your own beliefs are optimal

onlybecause they're yours?To the first group, I would agree that

ifyou've fully articulated a probability distribution,thenyou shouldn't be in a position where you think a different one is better than yours: in that case, you should update to the other one! (Or possibly to some third, even better distribution). The multiverse-frequentist fallaciously extends this result to apply to all priors.But this

doesn't meanyou should think your distribution is best in general. For example, you can believe that someone else knows more than you, without knowing exactly what they believe.In particular, it's easy to believe that

some computationknows more than you. If a task somehow involves factoring numbers, you might not know the relevant prime factorizations. However, you can justifiably trust a probability distribution whose description includes running an accurate prime factorization algorithm. You can prefer to replace your own beliefs with such a probability distribution. This lays the groundwork for justified non-Bayesian updates.## I can't gain information without observing things.

Maybe our reflective Bayesian doesn't literally think theirs is the best prior possible. However, they might be a staunch empiricist: they believe knowledge is entanglement with reality, and you can only get entanglement with reality by

looking.Unlike the multiverse-frequentist described in the previous section, the empiricist

canthink other people have better probability distributions. What the empiricistdoesn'tbelieve is that we can emulate any of their expertise merely by thinking about it. Thinking is useless (except, of course, for the computational requirements of Bayes' Law itself). Therefore, although helpful non-Bayesian updates might technically be possible (eg, if you could morph into a more knowledgeable friend of yours), it'snotpossible to come up with any which you can implement just by thinking.I can't think of any way to "justify" this assumption except if you really do have unbounded computational resources.

## The best prior is already one of my hypotheses.

This is, of course, just the usual

we assume that the world is in our hypothesis-space. We just have to find which of our hypotheses is the true one.assumption of realizability:This doesn't imply as strong a rejection of non-Bayesian updates. It

could bethat we can gain some useful information by computation alone. However, the need for this must be limited, becausewe already have enough computational power to simulate the whole world.What the assumption

doesgain you is a guarantee that you will make decisions well, eventually. If the correct hypothesis is in your hypothesis space, then once you learn it with sufficiently high confidence (which can usually happen pretty fast), you'll be making optimal decisions. This is a much stronger guarantee than the simple Bayesian has. So, the assumption does buy our Reflective Bayesian a lot of power in terms of justifying Bayesian reasoning.My steel-man of this perspective is

the belief that the universe is intelligible. I'm not sure what to call this belief. Here are a few versions of it:Everything is computable. It's absurd to imagine that anything physically realized would not be computable. So, it's sufficient to assume that the universe might be any computer program.Computationalism.Anything real must have a description in ZFC. Physics might include some uncomputable aspects, but they'd be things like halting oracles, which can be captured within ZFC.Set-theory-ism.Anything real must be mathematically describable. Not necessarily in any one axiom system such as ZFC -- we know (from Tarski's undefinability theorem) that there are mathematically describable things which fall outside any fixed axiom set. However, the universe must be mathematically describable inMathematicalism.somesense.I used to believe the third theory here. After all, what could it

possibly meanto suppose the universe isnotmathematically describable? Failing to assume this just seems likegiving up.But there is no

necessary lawsaying the universe must be mathematical, any more than there's a necessary law saying the universe has to be computational. Itdoesseem like we havestrong evidencethat the universe is mathematical in nature; mathematics has beensurprisingly helpfulfor describing the universe we observe. However, philosophically, it makes more sense for this to be a contingent fact, not a necessary one.## There is a best hypothesis, out of those I can articulate.

The

doesn't say that the universeweak realizability assumptionisone of my hypotheses; instead, it postulates thatout ofmy hypotheses, one of them is best. This is much more plausible, and gets us most of the theoretical implications.For example, if you use the Solomonoff prior, strong realizability says that the universe is computable. Weak realizability just says that there's one computer program that's best for predicting the universe.

It makes a lot more sense to think of Solomonoff induction as

searching for the best computational way to predict.The universe isn't necessarily computable, but computersare. If we're building AGI on computers, they can only use computable methods of predicting the world around them.However, the assumption that one of your hypotheses is best is more questionable than you might realize. It's easy to set up circumstances in which no prior is best. My favorite example is a Bayesian who is observing coin-flips, and who has two hypotheses: that the coin is biased with 1/3rd probability of heads, and symmetrically, that it's biased with 1/3rd on tails. In truth, the coin is fair. We can show that the Bayesian will alternate between the two hypotheses forever: sometimes favoring one, sometimes the other.

The simple Bayesian believes that such non-convergence is possible. The reflective Bayesian thinks it is not possible -- one of the hypotheses has to be best, so beliefs cannot go back and forth forever.

The simple Bayesian therefore can reflectively prefer non-Bayesian updates -- for example, in the case of the fair coin, you'd be better off to converge to an even posterior over the two hypotheses, rather than continue updating via Bayes. (Or, even better, make an update which adds "fair coin" to the hypothesis set.)

## I am calibrated, or can easily become calibrated.

is the property that of cases where you estimate, say, and 80% probability, the long-run frequency of those things happening is actually 80%. (Formally: for any number ϵ greater than zero, for any probability p, considering the sequence of all cases where you assign probability within ϵ of p, theCalibrationactual limiting frequencyof those things turning out to be true is within ϵ of p.)Calibration is a lesser substitute for saying that a probabilistic hypothesis is "true", much like "best hypothesis out of the space" is. Or, flipping it around: being uncalibrated is a particularly egregious way for a hypothesis to be false.

To illustrate: if your sequence is 010101010101010101..., a fair coin is a

calibratedmodel, even though there's a much better model. On the other hand, a biased coin isnota calibrated model. If we think the probability of "1" is 1/3, we will keep reporting a probability of 1/3, but the limiting frequency of the events will actually be 50%.So, clearly, believing your probabilities to be calibrated is

a wayto reflectively endorse them, although not an extremely strong way.I don't know that calibration implies any really

strongdefense of classical Bayesianism. However, it does provide somewhat stronger decision-theoretic guarantees. Namely, a calibrated estimate of the risks means that your strategy can't be outperformed byreally simpleadjustments. For example, if you're using a fair-coin model to make bets on the 010101010101... sequence, you will balance risks and rewards correctly (we can't make you do better by simply making you more/less risk-averse). The same cannot be said if you're using a biased-coin model.I recently (in private correspondence) dealt with an example where calibration provided a stronger justification for Bayesian approaches over frequentist ones, but I feel the details would be a distraction here. In general I expect a calibration assumption helps justify Bayes in a lot of contexts.

A naive argument in favor of calibration might be "if I thought I weren't calibrated, I would adjust my beliefs to become more calibrated. Therefore, I must be calibrated." This makes two mistakes:

My steelman of the calibration assumption is this: in general, it doesn't seem too hard to watch your calibration graph and adjust your reported probabilities in response. If you're an alien intelligence watching the 01010101... sequence, it might be hard to invent the "every other" hypothesis from scratch. However, it's easy to see that your P(1)=1/3 model is too low and should be adjusted upwards.

(OK, it's not hard at all to invent the "every other" pattern. But in more complicated cases, it's difficult to come up with a really new hypothesis, but it's relatively easy to improve the calibration on hypotheses.)

Note that the formal definition of "calibration" I gave at the beginning of this subsection doesn't really distinguish between "already calibrated" vs "will become calibrated at some point"; it's all asymptotic. So if we

can calibrateby looking at a calibration chart and compensating for our over-/under- confidence, then we "are already calibrated" from a technical standpoint. (Nonetheless, I think the intuitive distinction is meaningful and useful.)A counterargument to my steelman: it's actually computationally quite difficult to be calibrated. Sure, it doesn't seem so hard for humans to improve their calibration in practice, but the computational difficulty should give you pause. It might not make sense to suppose that humans are even approximately calibrated in general.

## Conclusion

I think Bayesian philosophy before Radical Probabilism4 over-estimated its self-consistency, underestimating the difference between simple Bayesianism and reflective Bayesianism (effectively making a map-territory error). It did so by implicitly making the mistakes above, as well as others. Sophisticated authors added technical assumptions such as calibration and realizability. These assumptions were then progressively forgotten through iterated summarization/popularization -- EG,

I think this happens all the time, with even the original authors possibly forgetting their own technical assumptions when they're not thinking hard about it.

Note that I'm not accusing anyone of literally believing the Reflective Bayesian positions I've outlined. (Actually, in particular, I want to avoid accusing

you...some of myotherreaders, perhaps...) What I'm actually saying is that it was a belief operating in the background, heuristically influencing how people thought about things.5For example:

"UDT just takes the actions which are optimal according to its prior. It can evaluate the expected utility of alternate policies by forward-sampling interactions from its prior. The actions which it indeed selects are going to be optimal, by definition. So, other policies look at best equally good. Therefore, it should never want to self-modify to become anything else."A naive argument for the reflective consistency of UDT:I think most of the people who thought about UDT probably believed something like this at some point.

There are several important mistakes in this line of reasoning.

even ifit worsens its strategy in doing so, because it might prefer to know with certainty that it will make a halfway decent selection, rather than stay in the dark about what it will do.My overall point, here, is just that we should be careful about these things. Simple belief and reflective belief are not identical. A Bayesian reasoner does not necessarily prefer to keep being a Bayesian reasoner. And a Bayesian reasoner can prefer a non-Bayesian update to become a different Bayesian reasoner.

The goal of a Radical Probabilist should be to understand these non-Bayesian updates, trimming the notion of "rationality" to include only that which is essential.

## Footnotes

## 1:

Truth and Paradoxby Tim Maudlin is an extreme example of this; by the end of the book, Maudlin admits that what he is writing cannot be considered true on his own account. He proceeds to develop a theory ofpermissibleassertions, which may not be true, but are normatively assertible. To top it off, he shows that no theory of permissibility can be satisfactory! He even refers to this as "defeat". Yet, he sees no better alternative, and so continues to justify his work as (mostly) permissible, though untrue.## 2:

Note that although the simple/reflective distinction is inspired by rigorous formal ideas in logic, I'm not in fact taking a super formal approach here. Note the absence of a formal definition of "reflective belief". I think there are several different formal definitions one could give. I mean

anyof those. I consider my definition to include any reason why someone might argue for a position, perhaps even dishonestly (although dishonesty isn't relevant to the current discussion, and should probably be viewed as a borderline case).## 3:

Aside: it's difficult to reliably maintain this distinction! When asserting things, are you asserting them simply or reflectively? Suppose I read Tim Maudlin's book (see footnote #1). What is "Tim Maudlin's position"? I can see good reasons to take it as (a) the explicit assertions, (b) the belief system which would endorse those explicit assertions, or (c) the belief system which the explicit assertions would themselves endorse.

In many circumstances, you'd say that what an author

reflectivelybelieves is their explicit assertions, and what theysimplybelieve is the implicit belief system which leads them to make those assertions. Note what this implies: if you claim X, then yoursimplebelief is thereflectivebelief in X, and yourreflectiveposition issimplebelief in X! Headache-inducing, right?But this often gets

moreconfusing, not less, if (as in Tim Maudlin's case) the author starts explicitly dealing with these level distinctions. What should you think if I tell you Isimplybelieve in X? I think it depends on how much you trust my introspective ability. If you don't trust it, then you'll conclude that I have belief-in-belief; Iendorsesimple belief in X (which isprobablythe same as endorsing X, ie, reflectively believing X). On the other hand, if youdotrust my introspective ability, then you might take it to mean "I believe X, but I don't know why / I don't know whether I endorse my reasons for that belief". This is like the Leverage Research concept of "belief report". This means you can take my assertion at face value: I've given you one of my simple beliefs.But what if someone makes a

habitof giving you their simple beliefs, rather than their reflective beliefs? This might be an honesty thing, or possibly an unreflective habit. Philosophers, academics, and smart people generally might be stuck in a rut of only giving reflective positions, because they're expecting to have to defend their assertions (and they like making defensible assertions). This calls into question whether/when we should assume that someone is giving us their reflective beliefs rather than their simple beliefs.And what if I tell you I

reflectivelybelieve in X? Do you takethatat face value? Or do you think Ireflectively reflectivelybelieve X (so my simple belief is Z, a position which reflectively endorses the position Y -- where Y is a position which reflectively endorses X).... You can see where things get difficult.

## 4:

By "Bayesianism before radical probabilism" I don't mean a temporal/historic thing, EG, Bayesianism before the 1950s (when Jeffrey first began inventing Radical Probabilism). Rather, I mean "the version of Bayesianism which strongly weds itself to Bayesian updates." Most centrally, I'm referring to LessWrong before Logical Induction.

## 5:

Simply put, early LessWrong

reflectively believedin (classical) Bayesianism, and thussimply believedthe justifying assumptions associated with Bayesianism. But few, if anyreflectivelybelieved those assumptions -- indeed, those assumptions have little justification when examined, and life gets more interesting when assuming their negation.The only general advice I can think of to avoid this mistake is "don't lose track of your assumptions".