Imagine an ordinary-looking text editor. It has the standard bold, italics, and underline styles, and the option to undo your latest editing command. The right arrow next to the undo icon, however, far from represents the typical redo feature. Whether you last undid a command or not, it can be clicked at any time, and its distinctive quality is that it can predict your next action.

In a sense, this Writing Assistant is an oracle that anticipates you with unerring accuracy. It is an exceptionally powerful autocompletion tool for forecasting the word or sentence that you were intending to type.

But then, isn’t this Writing Assistant always wrong? If it was smart enough, it would know that you never actually type the text it says you will type; what you actually do is to just click the right arrow for the assistant to autocomplete the text. This assistant is doomed to eternal falsity.

Or not.

What this Writing Assistant does is to model a world where you hadn’t called it up, and predict what you would have typed in that world. It is a Counterfactual Oracle: it returns predictions that would have been true under slightly different circumstances (circumstances specified by the programmer: for example, “as if this Writing Assistant wasn’t available” or “as if the writer chose not to use the Writing Assistant”).

The strength of this oracle is not to calculate every minute detail of every quark’s trajectory, because that would make its model perpetually wrong: in its counterfactual world, the quarks would have had to skip the laws of physics in order not to follow the deterministic path they did follow.

The strength of the oracle is to strike a balance of rightness: to not model the world at the exceedingly low level of quarks so that it would always be wrong because those quarks would have had to disobey the laws of physics; and to not model it at an exceedingly high level so that it would be wrong because it had omitted important details of the person’s character and dispositions.

This would be the level of the superintelligent Counterfactual Oracle: the level needed to predict a different, human intelligence.


Now consider a different oracle: an Ordering Assistant. This oracle shows up in the digital menu of your favourite Italian restaurant, and when asked it prognosticates — correctly — what dish you will pick. "Pineapple pizza", it responds with an expression of disgust.

The obvious loophole in this thought experiment is: does the oracle make predictions according to a counterfactual world in which it doesn't exist, or according to the real world in which it does exist and you know it exists?

The former would be a Counterfactual Oracle like the Writing Assistant. The latter, though, seems paradoxical. Because when the Ordering Assistant gave its answer, free will would always have an ace up its sleeve for you to reply “okay assistant, you say I will order pineapple pizza? Say what, now I’m going to order spaghetti carbonara”, and the oracle would no longer be a perfect oracle.

To be perfect, the oracle would obviously have to be more intelligent and take this into account. It would have to operate on multiple levels, modelling not only your tastes, but also your possible responses to any of the oracle’s answers, and so forth. This would be a Fully Deterministic Oracle, similar to Comed-Tea in HPMOR, where, no matter your state of preparedness, the outcome would be inescapable, because the artifact’s configuration had already allowed for the possibility of you trying to contradict it.

Drawing from the space of dishes, and the space of your potential responses, the oracle would select the dish where its own output and your choice converged. This would be a sort of Schelling point where consciousness and logic, free will and determinism, met.

Could there be multiple stable points where the oracle’s predictions and your later action were aligned? Maybe if the oracle had said you would order pineapple pizza, you would have ordered spaghetti carbonara instead. But if the oracle said you would order carbonara, you would indeed order carbonara; and if the oracle said you would order lasagna, you would indeed order lasagna. The first case is unstable and unrealisable for a perfect oracle, because “pineapple pizza” ⇎ carbonara; the second and third cases are both stable and coherent, because “carbonara” ⇔ carbonara, and “lasagna” ⇔ lasagna.

The second and third cases would both be logically possible from the moderately high level at which the Counterfactual Oracle operates. But this does not mean that the universe would branch into the two alternate realities of carbonara and lasagna; only one would obey the deterministic low level of quarks.

Logical plausibility from a high level does not imply necessary realisation on a low level. It is logically plausible that mother Alice gives birth to a son Bob or to a daughter Carol, and logically implausible that Carol gives birth to her mother Alice. And yet the laws of physics and the machinery of biology would generate only one reality, not a universe containing son Bob and a parallel universe containing daughter Carol.

Faced with two equally possible alternatives, the Fully Deterministic Oracle would choose the one that satisfied some given, arbitrary criteria.

However, what if there wasn’t a stable equilibrium for the Fully Deterministic Oracle? What if there was no dish D in the menu such that, if the oracle told you that you would order dish D, you would indeed order dish D? You could have written down the inviolable rule that, if the oracle said dish D, you would go for the next dish E on the menu.

Then the oracle would have taken this into account as well. If the Ordering Assistant was to maintain its status as a perfect superforecaster, and there really was no point of convergence, it would have no choice but to give no answer, shrug in helplessness, and perhaps shut itself down gracefully.

New Comment