Kurt Gödel showed that we could write within a system of arithmetic the statement, "This statement has no proof within the system," in such a way that we couldn't dismiss it as meaningless. This proved that if the system (or part of it) could prove the logical consistency of the whole, it would thereby contradict itself. We nevertheless think arithmetic does not contradict itself because it never has.
From what I understand we could write a version of the Gödel statement for the axioms of probability theory, or even for the system that consists of those axioms plus our current best guess at P(axioms' self-consistency). Edit: or not. According to the comments the Incompleteness Theorem does not apply until you have a stronger set of assumptions than the minimum you need for probability theory. So let's say you possess the current source code of an AGI running on known hardware. It's just now reached the point where it could pass the test of commenting extensively on LW without detection. (Though I guess we shouldn't yet assume this will continue once the AI encounters certain questions.) For some reason it tries to truthfully answer any meaningful question. (So nobody mention the Riemann hypothesis. We may want the AI to stay in this form for a while.) Whenever an internal process ends in a Jaynes-style error message that indicates a contradiction, the AI takes this as strong evidence of a contradiction in the relevant assumptions. Now according to my understanding we can take the source code and ask about a statement which says, "The program will never call this statement true." Happily the AI can respond by calling the statement "likely enough given the evidence." So far so good.
So, can we or can't we write a mathematically meaningful statement Q saying, "The program will never say 'P(Q)≥0.5'"? What about, "The program will never call 'P(Q)≥0.5' true (or logically imply it)"? How does the AI respond to questions about variations of these statements?
It seems as if we could form a similar question by modifying the Halting-Oracle Killer program to refute more possible responses to the question of its run-time, assuming the AI will know this simpler program's source code. Though it feels like a slightly different problem because we'd have to address a lot of possible responses directly – with the previous examples, if the AI doesn't kill us in one sense or another, we can go on to ask for clarification. Or we can say the AI wants to clarify any response that evades the question.