Principle of Explosion
The principle of explosion goes as follows: if we assume contradictory axioms then we can derive everything. Numbers prefixed with A are axioms/assumptions, those prefixed with D are derived from those.
D3 from and
In English: we assume both is true and is not true for some . Then for any , since we know , we have ( or ) is true. But since we know not is true, we know must be true.
This can cause issues in counterfactual reasoning: an agent considering the following counterfactual might fall prey to the following issue in the five and ten problem:
A1 I am an agent which will take the greater of $5 and $10
A2 $10 > $5
A3 I take $5
D4 Either ($5 > $10) or
D5 from and
This makes it impossible for the agent to reason about worlds in which its actions contradict with its knowledge of itself. This can be thought of a logical collision: the axioms 1 and 2 collide with counterfactual 3. I think this is related to issues of Lob's theorem but not quite the same. In Lobian problems the agent must reason about itself as a proof-searching machine. In this the agent only needs to reason about itself as a rational agent. Obviously humans do not normally make mistakes this egregious.
In mathematics, the Riemann Hypothesis has yet to be solved, that is, proved or disproved. A few different theorems have been proven to be true by an ingenious method: prove them true in the case that the Riemann Hypothesis is true, then prove them true in the case that the Riemann Hypothesis is false! This does the job of proving the theorem in question, but it introduces a wrinkle: one of the proofs might well be explosion-like. This illustrates that there is no good reason for the contradiction to be obvious. In fact if the Riemann Hypothesis is (as some people have suggested) unprovable, then there will be no logical contradiction involved in the derivation of either one!
What does it even mean for a proof to be explosion-like? We can make logical-looking proofs based on incorrect assumptions. Even applying a general theorem to a special case does this. The proof that every number has a prime factorization takes two starting cases, is or is not prime. Applying it to 345 is still valid, even though one branch of the proof takes the (contradictory) assumption that 345 is prime.
Agents and Uncertainty
An agent able to think probabilistically might use beliefs, rather than axioms. In case two we have 1 and 2 as beliefs. 3 is taken as a counterfactual. I think it helps to think of counterfactuals as distinct from beliefs, they are objects which are used to reason about the world, and do not have probabilities assigned to them. Each counterfactual does, however, correspond to a belief, which has its own probability assigned.
If the agent notices that 1 and 2 are inconsistent with the counterfactual under consideration, it can use the uncertainty of beliefs to make sense of the counterfactual: ignore the proportion of probability mass assigned to worlds where 1 and 2 are both true. This defuses the logical collision. As long as it is sensible enough to not assign probability 1 to its beliefs it can reason sensibly again. This might look like the following:
B1 I am an agent which will take the greater of $5 and $10 (probability 0.99)
B2 $10 > $5 (probability 0.9999)
C3 I take $5
The defusing does two things: first of all it defuses the explosion, we can no longer derive everything. Now we can work successfully from counterfactuals.
It also allows the agent to (outside of the line of reasoning where C3 is a counterfactual) derive that the belief corresponding to C3 is unlikely, by working backwards. The agent now has information over its own future decisions, but in a way which does not cause the logical explosion from earlier.
If we are reasoning about maths and we take the counterfactual "345 is prime", we can construct a line of reasoning going back to our beliefs about maths which creates a contradiction.
It seems reasonable for the agent to then do two things. In the counterfactual sandbox where 345 is prime, it must assign some probability to each step in its reasoning being incorrect, and some probability to each of its axioms being incorrect (say 5% if there are twenty steps including the axioms). Secondarily, it can go back and assign very very small probability to the belief corresponding to the counterfactual "345 is prime" (as in belief-space each step in the reasoning has <<5% chance of being incorrect). If it uses some sort of beam-width search of possible mathematical proofs, then it can avoid allocating resources to the case that 345 is prime in future. This seems more like how humans reason.
When reasoning about its own behaviour, it seems like an agent should be much more uncertain about its own behaviour than its own reasoning capabilities. The trick applied to 345 being prime earlier works with arbitrarily small chances of reasoning being incorrect.