I've written up a 2-page explanation and proof of Aumann's agreement theorem. Here is a direct link to the pdf via Dropbox.
The proof in Aumann's original paper is already very short and accessible. (Wei Dai gave an exposition closely following Aumann's in this post.) My intention here was to make the proof even more accessible by putting it in elementary Bayesian terms, stripping out the talk of meets and joins in partition posets. (Just to be clear, the proof is just a reformulation of Aumann's and not in any way original.)
I will appreciate any suggestions for improvements.
Update: I've added an abstract and made one of the conditions in the formal description of "common knowledge" explicit in the informal description.
Update: Here is a direct link to the pdf via Dropbox (ht to Vladimir Nesov).
Update: In this comment, I explain why the definition of "common knowledge" in the write-up is the same as Aumann's.
Update 2020-05-23: I fixed the Dropbox link and removed the Scribd link.
Maybe this is a good place to ask something I wonder about: does Aumann's agreement theorem really have practical significance for disputes between people?
It assumes that the agents involved are Bayesian reasoners, have the same priors, and have common knowledge of each other's posteriors. The last condition might hold for people who disagree about something (although arguers routinely misinterpret each other, so maybe even that's too optimistic), but I'd expect people in a serious argument to have different priors most of the time, and nobody is a perfect Bayesian reasoner. As far as I can tell, that means two of the theorem's prerequisites are routinely violated when people disagree, and the one that's left over is often arguable too.
This makes me sceptical when I see people refer to "Aumanning" or the irrationality of agreeing to disagree. Still, there are two obvious ways I could be going wrong here:
The theorem's Wikipedia page references papers by Scott Aaronson & Robin Hanson. Aaronson's doesn't sound relevant (it seems to be about the rate of agreement, not whether eventual agreement is assured), but Hanson's looks like it might drain the force out of the common priors assumption by arguing that rational Bayesians should always have the same priors.
I haven't read Hanson's paper, but even if I assume that I don't have to worry about the equal priors assumption, I still have to contend with the assumption that the arguers are Bayesian. I can only think of one way for someone in an argument to be sure that the others calculated their posteriors Bayesianly: by sitting down and explicitly re-deriving them from everybody's likelihoods. But that defeats the point of the theorem! I feel like I'm missing something here but can't see what.
There's a discussion of practical implications of AAT in my post.
Thanks! It's interesting that you focus on the common knowledge assumption as the really strict assumption, rather than Bayesian-ness.
The common-knowledge condition really is surprisingly strong. I think that this is especially clear from the definition that I gave in my write-up. The common knowledge C is a piece of information so strong that, once you know it, your posterior probability for the proposition A is totally fixed — no additional information of any kind can make you more or less confident in A.
I thought that "common knowledge" means something which everybody knows, and everybody knows that everybody knows it, and everybody knows that everybody knows that everybody knows, ad infinitum. However I can't see isomorphism between that and the definition you have used. Are they the same, or it's only a confusing coincidence in terminology?
Here is a proof of the equivalence between my definition and Aumann's for "common knowledge". I'm assuming some familiarity with set partitions.
Aumann's definition is in terms of the Kolmogorov model of probability. In particular, a proposition is identified with the set of possible worlds in which the proposition is true.
Let P₁ be that partition of the possible worlds such that two worlds share the same block in P₁ if and only if I condition on the same body of knowledge when computing posterior probabilities in the two worlds*. Let P₂ be the analogous partition of the possible worlds for you. For each world w, let P₁(w) denote the block in my partition containing w, and let P₂(w) be the block in your partition containing w. Let P denote the finest common coarsening of our respective partitions**, and let P(w) be the block of P containing w.
Fix a proposition A. Let p be my posterior probability for A, and let q be yours (in the actual world). Let E be the set of worlds in which I assign posterior probability p to A, while you assign posterior probability q. Formally:
Let w₀ be the actual world. Aumann's definition says that our respective posterior probabilities are common knowledge (in the actual world) if P(w₀) ⊆ E.
On the other hand, I said that our posterior probabilities are common knowledge when there is a proposition C that is true in the actual world and satisfies the following three conditions:
For each world w in C, P₁(w) ⊆ C and P₂(w) ⊆ C.
For each world w in C, p = prob(A | P₁(w)).
For each world w in C, q = prob(A | P₂(w)).
These definitions are logically equivalent.
For, suppose that our posteriors are common knowledge in Aumann's sense. Then, by setting C = P(w₀), we get that the posteriors are also common knowledge in my sense.
On the other hand, suppose that the posteriors are common knowledge in my sense. It is given that C happened in the actual world, meaning that w₀ ∈ C. By Condition 1, C is a disjoint union of P₁-blocks and a disjoint union of P₂-blocks. This means that C is a block containing w₀ in some common coarsening of P₁ and P₂. Hence, C contains the block P(w₀) containing w₀ in the finest common coarsening of P₁ and P₂. That is, P(w₀) ⊆ C. Conditions 2 and 3 together imply that C ⊆ E. Thus we get that P(w₀) ⊆ E, so our posteriors are also common knowledge in Aumann's sense.
* That this relation induces a partition assumes, in effect, that I know what I don't know — i.e., that there are no unknown unknowns.
** Aumann calls this the meet of P₁ and P₂, because he considers coarsenings to be lower in the partial order of partitions. However, people often use the opposite convention, in which case P would be called the join.
The conditions are logically equivalent. Unfortunately, I don't see a way to show this without using the partition poset terminology that I intended to avoid. Nonetheless, if you unpack the definition of "common knowledge" in Aumann's paper, it is equivalent to what I gave. (ETA: I give the unpacking in this comment.)
dropbox.com currently gives you 2GB of free storage, and you can share files via direct public links.
Awesome. Thank you. The write-up can now be downloaded here.
Update 2020-05-23: Updated Dropbox link.
That is not a link to a pdf. It is a link to a page from which you can download a pdf if you log in to Scribd.com. (I would've preferred a link to a pdf.)
I didn't realize that you needed a Scribd login to download the pdf. That is a deal breaker. I will find another place for the document.
I was able to read the file without logging in.
Great job, by the way.
It looks like you can read the document "in-line", but you can't download it. Or were you able to download it somehow?
Thanks! I would like an abstract.
Also, I hate the term "common knowledge". Is there a good reason to perpetuate this terminology? It just doesn't mean what it says.
IIRC, the standard term in philosophy for the same thing is "mutual belief".
It seems from your presentation like first agent can only hold one of E1 or E2, etc., where E1, E2, etc. are disjoint subsets of the sample space. But the same result holds if the agent can hold the state of knowledge that is a union of such sets, so the first agent could believe E1+E2, etc. There seems to be no reason to not make this generalization. IIRC, that's how Aumann's paper was describing it.
No, not as I read it.
My C & E_i correspond to blocks in the partition that Aumann assigns to agent 1. Aumann has agent 1 evaluating the posterior probability of A in world ω by computing p(A | P₁(ω)), where P₁(ω) is the element of 1's partition containing ω. So the agent will never condition on a union P₁(ω₁) ∪ P₁(ω₂) of distinct blocks. (Here I'm following Aumann's notation as closely as possible.) Correspondingly, my agent never conditions on a disjunction [(C & E₁) ∨ (C & E₂)].
But I think that you're right about the generalization being trivial. It should just involve the standard procedure for turning a union A ∪ B into a disjoint union A ∪ (B∖A).