Edit: This article has major flaws. See my comment below.
This idea was informed by discussions with Abram Demski, Scott Garrabrant, and the MIRIchi discussion group.
For a logical inductor , define logical counterfactuals by
for a suitable and a random variable independent of with respect to . Using this definition, one can construct agents that perform well in ASP-like problems.
Recall the Agent Simulates Predictor problem:
Naively, we want to solve this by argmaxing:
Hopefully, , , and . Also, two-boxing should be less attractive than one-boxing:
However, if we make this well-defined with -exploration, we'll get
and then the agent will two-box, contradiction. Instead we'd like to use predictable exploration and set
for small enough that the right-hand side is sensible. Let's see how.
Choose so that . Our agent decides whether to explore at stage , and uses its beliefs at stage as a substitute for counterfactuals:
Here are small positive numbers. It's easy to see that, under reasonable assumptions, this agent 1-boxes on Agent Simulates Predictor. But it can't use the full strength of in its counterfactual reasoning, and this is a problem.
To illustrate the problem, add a term to the utility function that sometimes rewards two-boxing:
The agent should two-box if and only if . Assuming that's the case, and knows this, we have:
So if , two-boxing is the more attractive option, which is a contradiction. (I'm rounding to zero for simplicity.)
The problem is that the counterfactual has to rely on 's imperfect knowledge of . We want to combine 's ignorance of with 's knowledge of .
If is independent of conditioned on with respect to , then we can do this:
Then replace with :
This is more accurate than , and unbiased.
If is not independent of conditional on , we can introduce an auxilliary variable and construct a version of that is independent. This construction is a solution to the following differential privacy problem: Make a random variable that is a function of and independent randomness, maximizing the mutual conditional information , subject to the constraint that is independent of . Using the identity
we see that the maximum is attained when , which means that is a function of and .
Now here's the construction of :
Let be the finite set of possible values of , and let be the finite set of possible values of . We'll iteratively construct a set and define a random variable taking values in . To start with, let .
and for each , choose some such that . Then make a random binary variable such that
Then let be the event defined by
and add to . After repeating this process times, we are done.
We can do this with a logical inductor as well. In general, to get a sentence such that , take .
Now given random variables and , and some informative sentences , let be the random variable encoding the values of . The above construction works approximately and conditional on to give us a random variable that is approximately independent of conditional on with respect to . Now we define
This succeeds on the problem at the beginning of this section: Assume , and assume that knows this. Then:
which does not lead to contradiction. In fact, there are agents like this that do at least as well as any constant agent:
Let be a utility function defined with metasyntactic variables , , and . It must be computable in polynomial time as a function of , , and , where can be any polytime functions that doesn't grow too slowly and such that . Then there exists a logical inductor such that for every , there exists , , and a pseudorandom variable such that the agent defined below performs at least as well on as any constant agent, up to a margin of error that approaches as :
Choose smaller than the strength parameter of the weakest predictor in . If is the best constant policy for , assume . Since can compute , our agent's factual estimate is accurate, and the counterfactual estimate for is an accurate estimate of the utility assigned to the constant policy , as long as we make rich enough. So the agent will choose . Thus we have an implication of the form "if believes , then is true", and so we can create a logical inductor that always believes that for every by adding a trader with a large budget that bids up the price of .
Isn't this just UDTv2?
This is much less general than UDTv2. If you like, you can think of this as an agent that at time chooses a program to run, and then runs that program at time , except the program always happens to be "argmax over this kind of counterfactual".
Also, it doesn't do policy selection.
Instead of handing the agent a pseudorandom variable that captures everything important, I'd like to have traders inside a logical inductor figure out what should be on their own.
Also, I'd rather not have to hand the agent an optimal value of .
Also, I hope that these counterfactuals can be used to do policy selection and win at counterfactual mugging.