It seems that in general, the less certain any counterfactual oracle is about its prediction, the more self-confirming it is. This is because the possible counterfactual worlds in which we have or acquire self-confirming beliefs regarding the prediction will have a high expected score
This is actually only true in certain cases, since in general many other counterfactual worlds could also have high expected scores. Specifically, it is true to the extent that the oracle is uncertain mostly about aspects of the world that would be affected by the prediction, and to the extent that self-confirming predictions lead to higher scores than any alternative.
Submission (LB). The post's team-choosing example suggests a method for turning any low-bandwidth oracle O into a counterfactual oracle O′: have O′ output o from the same set of possible outputs L; in case of erasure calculate R(l) for a randomly chosen l∈L and set R′(o)=R(l) if o=l and to−∞ otherwise. Although the counterfactual low-bandwidth oracle is not any safer1, it has the advantage of almost never requiring us to evaluate its score. Thus, by running multiple oracles in sequence (stopping the process after the first erasure event) we can (with high probability) receive the full series of answers as if from a high-bandwidth oracle.
For example, we can ask each oracle in turn for advice on how to make a more effective processor. If erasure occurs, we attempt to make a processor with the advice obtained up that point and use some measure of its performance as the score. If there is no erasure event, the final concatenation of answers forms a much safer guide to processor building than an equally large answer from a single oracle.1. It seems that in general, the less certain any counterfactual oracle is about its prediction, the more self-confirming it is. This is because the possible counterfactual worlds in which we have or acquire self-confirming beliefs regarding the prediction will have a high expected score. Hence:Submission (CF). Given a high-bandwidth counterfactual oracle, use a second counterfactual oracle with a shared erasure event to predict its score. If the predicted score's distance from its upper bound is greater than some chosen limit, discard the high-bandwidth prediction.
Submission (CF). Use a counterfactual oracle to send a message to ourselves with a time delay. We choose an episode length T and set of possible messages M. The oracle outputs a time to<T and message mo∈M. We are automatically sent mo at time to unless erasure occurs, in which case at the end of the episode we input a time t∗and message m∗ which we wish we had received at t∗. The oracle's reward is then −(t∗−to)2 if mo=m∗ and −∞ otherwise. We do not find out that erasure has occurred until the end of the episode, ensuring that events before to are the same as in the counterfactual world.Here are some examples, which can work with a single possible message, used as an alert. EDIT: I made the examples slightly more concrete.
EDIT: Here are some potentially useful modifications we can make to the oracle: