Curated.
Using Bayes-type epistemology is a core LessWrong topic, and I think this represents a bunch of progress on that front (whether the results are already real-world-ready or just real-world-inspired). I have only engaged with small parts of the thesis, but those parts seem pretty exciting; so far, I particularly like knowing about quasi-arithmetic pooling. It feels like I've become less confused about something that I didn't know I was confused about — the connection between the character of the proper scoring rule and the right ways to aggregate those probabilities.
I also appreciate Eric's work making blogposts explaining more of his thoughts in a friendly way. Hope to see a few more distillations come out of this thesis!
Very cool work! A couple of (perhaps-silly) questions:
Great questions!
Congratulations! I wish we could have collaborated while I was in school, but I don't think we were researching at the same time. I haven't read your actual papers, so feel free to answer "you should check out the paper" to my comments.
For chapter 4: From the high level summary here it sounds like you're offloading the task of aggregation to the forecasters themselves. It's odd to me that you're describing this as arbitrage. Also, I have frequently seen the scoring rule be used with some intermediary function to determine monetary rewards. For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters. One would imagine that this would eliminate the incentive to report the aggregate as your own answer, but there's a reason we (the researcher/platform/website) aggregate individual forecasts! It's actually just more accurate under typical conditions. In theory an individual forecaster could improve that aggregate by forming their own independent forecast before seeing the work of others, and then aggregating, but in practice the impact of an individual forecast is quite small. I'll have to read about QA pooling, it's surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
For chapter 7: It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts. For example, if Alice and Bob are forecasting "Will AI cause human extinction in the next 100 years?", they both might individually forecast ~0% for different reasons. Alice believes it is impossible for AI to get powerful enough to cause human extinction, but if it were capable of acting it would kill us all. Bob believes any agent smart enough to be that powerful would necessarily be morally upstanding and believes it's extremely likely that it will be built. Any reasonable aggregation strategy will put the aggregate at ~0% because each individual forecast is ~0%, but if they were to communicate with one another they would likely arrive at a much higher number. I suspect that you address this in the assumptions of the model in the actual paper.
Congrats again, I enjoyed your high level summary and might come back for a more detailed read of your papers.
Thanks! Here are some brief responses:
From the high level summary here it sounds like you're offloading the task of aggregation to the forecasters themselves. It's odd to me that you're describing this as arbitrage.
Here's what I say about this anticipated objection in the thesis:
For many reasons, the expert may wish to make arbitrage impossible. First, the principal may wish to know whether the experts are in agreement: if they are not, for instance, the principal may want to elicit opinions from more experts. If the experts collude to report an aggregate value (as in our example), the principal does not find out whether they originally agreed. Second, even if the principal only seeks to act based on some aggregate of the experts' opinions, their method of aggregation may be different from the one that experts use to collude. For instance, the principal may have a private opinion on the trustworthiness of each expert and wishes to average the experts' opinions with corresponding weights. Collusion among the experts denies the principal this opportunity. Third, a principal may wish to track the accuracy of each individual expert (to figure out which experts to trust more in the future, for instance), and collusion makes this impossible. Fourth, the space of collusion strategies that constitute arbitrage is large. In our example above, any report in [0.546, 0.637] would guarantee a profit; and this does not even mention strategies in which experts report different probabilities. As such, the principal may not even be able to recover basic information about the experts' beliefs from their reports.
For example, when I worked with IARPA on geopolitical forecasting, our forecasters would get financial rewards depending on what percentile they were in relative to other forecasters.
This would indeed be arbitrage-free, but likely not proper: it wouldn't necessarily incentivize each expert to report their true belief; instead, an expert's optimal report is going to be some sort of function of the expert's belief about the joint probability distribution over the experts' beliefs. (I'm not sure how much this matters in practice -- I defer to you on that.)
It's surprising to me that you could disincentivize forecasters from reporting the aggregate as their individual forecast.
In Chapter 4, we are thinking of experts as having immutable beliefs, rather than beliefs that change upon hearing other experts' beliefs. Is this a silly model? If you want, you can think of these beliefs as each expert's belief after talking to the other experts a bunch. In theory(?) the experts' beliefs should converge (though I'm not actually clear what happens if the experts are computationally bounded); but in practice, experts often don't converge (see e.g. the FRI adversarial collaboration on AI risk).
It seems to me that under sufficiently pessimistic conditions, there would be no good way to aggregate those two forecasts.
Yup -- in my summary I described "robust aggregation" as "finding an aggregation strategy that works as well as possible in the worst case over a broad class of possible information structures." In fact, you can't do anything interesting in the worse case over all information structures. The assumption I make in the chapter in order to get interesting results is, roughly, that experts' information is substitutable rather than complementary (on average over the information structure). The sort of scenario you describe in your example is the type of example where Alice and Bob's information might be complementary.
Hello, this looks very interesting to me. So I want sort of making a mark on it, so that I can find it later. At the moment I can´t do much out of personal reasons. I am a newbie here, barely knowing what this "bayesian thing" is about.
But knowing a bit about making a sound axiom system, I have a feeling there could be some work done to make the "table", the axiom system all this stands on, a better one.
My wordpress blog ( and I have still to learn how to write one) holds at the moment only my "resurrected" thesis and doctoral thesis from about 30 years ago. In german at that. I had to dig up the old latex files, then make pdf files out of it. Since latex changed a lot, I could not resurrect the correct pictures yet.
Just marking this for myself, so that I can find it later, at the moment real, deep mathematical thinking is not possible and actually I should not even handle this forum. But this just looked to interesting.
Really fascinating stuff! I have a (possibly answered) question about how using expert updates on other expert prediction might be valuable.
You discuss the negative impacts of allowing experts to aggregate themselves, or viewing one another's forecasts before initially submitting their own. Might there be value in allowing experts to submit multiple times, each time seeing the submitted predictions of a previous round? The final aggregation scheme would be able to not only assign a credence to each expert, but also gain a proxy for what credence the experts give to one another. In a more practical scenario where experts will talk if not collude, this might give better insight into how expert predictions are being created.
Thanks for taking the time to distill this work into a more approachable format - it certainly made the thesis more manageable!
Yeah, there's definitely value in experts being allowed to submit multiple times, allowing them to update on other experts' submissions. This is basically the frame taken in Chapter 8, where Alice and Bob update their estimate based on the other's estimate at each step. This is generally the way prediction markets work, and I think it's an understudied perspective (perhaps because it's more difficult to reason about than if you assume that each expert's estimate is static, i.e. does not depend on other experts' estimates).
One annoying thing in reading Chapter 3 -- chapter 3 states that for l=2,4,8, the optimal scoring rules can be written in terms of elementary functions. However, you only actually give the full formula for the case l=8 (for l=2 you give it on half the interval). What are the formulas for the other cases?
(But also, this is really cool, thanks for posting this!)
Ha! OK, that is indeed nasty. Yeah I guess CASes can solve this kind of problem these days, can't they? Well -- I say "these days" as if it this hasn't been the case for, like, my entire life, I've just never gotten used to making routine use of them...
‹‹ I noticed a strong commonality among the questions that I had found particularly fascinating: most of them involved reasoning about knowledge, information, or uncertainty under constraints ››
This is also true for me, and I loved reading this post for this reason!
Back in the day I applied to study with Joe Halpern because of his work on epistemic logic, and ended up studying Logic in Amsterdam. At some point I got tired of Logic and its contrived puzzles (Muddy Children, etc) and decided to focus on Probability instead.
Has anyone studied the idea of rewarding people according to how much their input improves the aggregate (whatever algorithm is being used), rather than for their individual accuracy?
The LessWrong Review runs every year to select the posts that have most stood the test of time. This post is not yet eligible for review, but will be at the end of 2025. The top fifty or so posts are featured prominently on the site throughout the year.
Hopefully, the review is better than karma at judging enduring value. If we have accurate prediction markets on the review results, maybe we can have better incentives on LessWrong today. Will this post make the top fifty?
In January, I defended my PhD thesis, which I called Algorithmic Bayesian Epistemology. From the preface:
Although my interest in mathematical reasoning about uncertainty dates back to before I had heard of the rationalist community, the community has no doubt influenced and strengthened this interest.
The most striking example of this influence is Scott Aaronson's blog post Common Knowledge and Aumann's Agreement Theorem, which I ran into during my freshman year of college.[1] The post made things click together for me in a way that made me more intellectually honest and humble, and generally a better person. I also found the post incredibly intellectually interesting -- and indeed, Chapter 8 of my thesis is a follow-up to Scott Aaronson's academic paper on Aumann's agreement theorem.
My interest in forecast elicitation and aggregation, while pre-existing, was no doubt influenced by the EA/rationalist-adjacent forecasting community.
And Chapter 9 of the thesis (work I did at the Alignment Research Center) is no doubt causally downstream of the rationalist community.
Which is all to say: thank you! Y'all have had a substantial positive impact on my intellectual journey.
Chapter descriptions
The thesis contains two background chapters followed by seven technical chapters (Chapters 3-9).
In Chapter 1 (Introduction), I try to convey what exactly I mean by "algorithmic Bayesian epistemology" and why I'm excited about it.
In Chapter 2 (Preliminaries), I give some technical background that's necessary for understanding the subsequent technical chapters. It's intended to be accessible to readers with a general college-level math background. While the nominal purpose of Chapter 2 is to introduce the mathematical tools used in later chapters, the topics covered there are interesting in their own right.
Different readers will of course have different opinions about which technical chapters are the most interesting. Naturally, I have my own opinions: I think the most interesting chapters are Chapters 5, 7, and 9, so if you are looking for direction, you may want to tiebreak toward reading those. Here are some brief summaries:
Chapter 3: Incentivizing precise forecasts. You might be familiar with proper scoring rules, which are mechanisms for paying experts for forecasts in a way that incentivizes the experts to report their true beliefs. But there are many proper scoring rules (most famously, the quadratic score and the log score), so which one should you use? There are many perspectives on this question, but the one I take in this chapter is: which proper scoring rule most incentivizes experts to do the most research before reporting their forecast? (See also this blog post I wrote explaining the research.)
Chapter 4: Arbitrage-free contract functions. Now, what if you're trying to elicit forecasts from multiple experts? If you're worried about the experts colluding, your problem is now harder. It turns out that if you use the same proper scoring rule to pay every expert, then the experts can collude to all report the same forecast -- and then redistribute their rewards -- in a way that leaves every expert better off, no matter the outcome, than if they hadn't colluded. (The term for this sort of collusion is "arbitrage".) On the other hand, you now have more flexibility, because you can pay each expert in a way that depends on the other experts' reports. In this chapter, I resolve an open problem by showing how to pay experts in a way that makes such arbitrage impossible.
Chapter 5: Quasi-arithmetic pooling. (One of my favorites.) Let's say that you've used a proper scoring rule to elicit forecasts from multiple experts, and now you want to aggregate them. This chapter's basic take is that your method of aggregation ought to depend on the scoring rule you used. For instance, the log scoring rule incentivizes experts to "be careful" around extreme probabilities: an expert incentivized with the log scoring rule cares substantially about the difference between a 1% chance and a 0.01% chance, while an expert incentivized with the quadratic scoring rule basically doesn't care. So if you're using the log scoring rule for eliciting forecasts, it makes sense to "take low probabilities more seriously" when aggregating the forecasts.
In the chapter, I define quasi-arithmetic (QA) pooling with respect to a proper scoring rule to be a particular method of forecast aggregation that depends on the scoring rule. For example, QA pooling with respect to the quadratic score means averaging the forecasts, while QA pooling with respect to the log score means averaging the log odds. I show that QA pooling has a bunch of nice properties and argue that the connection it establishes between forecast elicitation and forecast aggregation is pretty natural and fundamental.
Chapter 6: Learning weights for logarithmic pooling. QA pooling allows you to put weights on experts, depending on their past performance/how much you trust them. In Chapter 5, I showed that if the proper scoring rule is bounded, then you can efficiently learn weights for experts, by updating the weights based on the experts' performance, in a way that lets your aggregates be almost as good as if you had known the right set of weights from the beginning. But the log scoring rule isn't bounded! This chapter extends this result to the log scoring rule, under the assumption that the experts' forecasts are calibrated.
Chapter 7: Robust aggregation of substitutable signals. (One of my favorites.) Let's say that Alice says there's a 60% chance that it'll rain tomorrow and Bob says there's a 70% chance. How should you aggregate these forecasts into a single, all-things-considered number? The obvious answer is that it depends: if Alice knows strictly more information than Bob, you should say 60%. If Bob knows strictly more than Alice, you should say 70%. If their forecasts are based on disjoint pieces of evidence, your answer should be different from if their forecasts are based on heavily overlapping evidence.
But suppose you just don't know. After all, in practice, you may not have this information. This chapter explores a solution concept called robust aggregation: finding an aggregation strategy that works as well as possible in the worst case over a broad class of possible information structures. The broad class I study here is, roughly speaking, information structures that satisfy informational substitutes, meaning that learning an additional expert's information is less valuable, the more information you already know (i.e., diminishing marginal returns). One highlight of this chapter is a theoretical justification for the practice of extremization: pushing the aggregate forecast away from the prior (see Jaime Sevilla's writeup here).
Chapter 8: When does agreement imply accuracy? Suppose that Alice and Bob disagree about the value of some quantity, because they have different private information. How can they reach agreement? They can of course do so by exchanging all of their information, but that can take a really large amount of communication. Scott Aaronson's 2005 paper shows that Alice and Bob can quickly reach agreement simply by repeatedly exchanging their estimates (Alice tells Bob her estimate; Bob tells Alice his estimate after updating on what Alice just said; Alice tells Bob her estimate after updating on what Bob just said; and so on). However, this agreement could be very superficial: once Alice and Bob agree, the number they agree on might differ substantially from the number they would have agreed on, had they actually exchanged all of their information. This chapter establishes a sufficient condition under which, if Alice and Bob agree, their agreed-upon estimate does reflect the estimate they would reach if they exchanged all of their information.
Chapter 9: Deductive circuit estimation. (One of my favorites.) This chapter represents work I did at the Alignment Research Center around April-May of last year. In Formalizing the presumption of independence, we asked whether the process of deductive reasoning can be formalized. In this chapter, I explore deductive reasoning in the special case of boolean circuits. Informally, a deductive estimation algorithm takes as input a boolean circuit and an explanation of the circuit's behavior (in some formal language) and outputs an estimate of the fraction of inputs on which the circuit outputs 1. I explore how deductive estimation algorithms ought to behave, and give both positive results ("here's an efficient deductive estimation algorithm that satisfies linearity and respect for proofs") and negative results ("but no efficient deductive estimation algorithm additionally satisfies 0-1 boundedness").
A note on the title
The thesis is called "Algorithmic Bayesian Epistemology". I think most people here know roughly what I mean by "Bayesian epistemology".
The word "algorithmic" is more of a term of art, referring to the "algorithmic lens" of theoretical computer science. Quoting from the introduction:
I ran into the post because it was linked from this wonderful Slate Star Codex post, which was also one of the first SSC posts I had run into. I had the classic "oh man, I've found my people!" experience that day.
The term "algorithmic lens" was coined at Berkeley by members of the Theory of Computing research group around the year 2000.
See Chapter 20 here for a detailed discussion.