Great work!
Stuart Armstrong gave one more example of a heuristic argument based in the presumption of independence here.
There are a huge number of examples like that floating around in the literature, we link to some of them in the writeup. I think Terence Tao's blog is the easiest place to get an overview of these arguments, see this post in particular but he discusses this kind of reasoning often.
I think it's easy to give probabilistic heuristic arguments for about 80 of the ~100 conjectures in the wikipedia category unsolved problems in number theory.
About 30 of those (including the Goldbach conjecture) follow from the Cramer random model of the primes. Another 9 are slightly non-trivial applications of random models for the primes. About 8 of them are simple heuristics for diophantine equations (like Fermat's last theorem). I estimate that another ~30 have arguments that are more diverse and interesting (I estimated the size of this set by randomly sampling some conjectures, stratified by difficulty, and seeing how often we could find an argument in an hour).
We'd guess that it's also possible to give arguments for the remaining ~20, it would just be too hard for us to do within an hour. Random representative examples for which we don't know heuristic arguments, sorted by apparent difficulty for a layperson:
This category is interesting to us because:
That said, I think that many people believe that number theory is an unusual domain that is particularly amenable to probabilistic heuristic arguments, and so it's likely not the best place to search for counterexamples.
(I did not have anything to do with this paper and these are just my own takes.)
The Alignment Research Center recently published their second report, Formalizing the presumption of independence. While it's not explicitly about AI alignment, it's probably still interesting for some people here.
Summary
The paper is about "heuristic arguments". These are similar to proofs, except that their conclusions are not guaranteed to be correct and can be overturned by counterarguments. Mathematicians often use these kinds of arguments, but in contrast to proofs, they haven't been formalized. The paper mainly describes the open problem of finding a good formalization of heuristic arguments. They do describe one attempt, "cumulant propagation", in Appendix D, but point out it can behave pathologically.
So what's the "presumption of independence" from the title? Lots of heuristic arguments work by assuming that some quantities are independent to simplify things, and that's what the paper focuses on. Such an argument can be overturned by showing that there’s actually some correlation we initially ignored, which should then lead to a more sophisticated heuristic argument with a potentially different conclusion.
What does this have to do with alignment?
The paper only very briefly mentions alignment (in Appendix F), more detailed discussion is planned for the future. But roughly:
Heuristic arguments can be seen as somewhere between interpretability and formal verification: unlike interpretability, heuristic arguments are meant to be machine-checkable and don't have to be human-understandable. But unlike formal proofs, they don't require perfect certainty and might be much easier to find.
Readers here might also be reminded of Logical Induction. This paper is trying to do something somewhat different though:
So should you read the paper?
Given it's a 60-page report (though most of that's appendices) with basically no explicit discussion of alignment, I don't think this is a "must-read" for everyone. For example, if you haven't read the ELK report, I would strongly recommend that over this new paper.
On the other hand, if you work on something related, such as formal verification, ELK, or conceptual interpretability research, I think it makes a lot of sense to at least look at the main paper and Appendix F (16 pages and quite readable).
Personally, I also think this is just really interesting independent from alignment. Appendix B and C were my favorite parts from that perspective (though also the most speculative ones).