# Ω 8

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Since beliefs about Evidential Correlations don't track any direct ground truth, it's not obvious how to resolve disagreements about them, which is very relevant to acausal trade.
Here I present what seems like the only natural method (Third solution below).
Ideas partly generated with Johannes Treutlein.

Say two agents (algorithms A and B), who follow EDT, form a coalition. They are jointly deciding whether to pursue action a. Also, they would like an algorithm C to take action c. As part of their assessment of a, they’re trying to estimate how much evidence (their coalition taking) a would provide for C taking c. If it gave a lot of evidence, they'd have more reason to take a. But they disagree: A thinks the correlation is very strong, and B thinks it’s very weak.

This is exactly the situation in which researchers in acausal trade have many times found themselves: they are considering whether to take a slightly undesirable action a (spending a few resources on paperclips), which could provide evidence for another agent C (a paperclip-maximizing AI in another lightcone) taking an action c (the AI spending a few resources on human happiness) that we'd like to happen. But different researchers A and B (within the coalition of "humans trying to maximize human happiness") have different intuitions about the strength of the correlation.

A priori, there could exist the danger that, by thinking more, they would unexpectedly learn the actual output of C. This would make the trade no longer possible, since then taking a would give them no additional evidence about whether c happens. But, for simplicity, assume that C is so much more complex and chaotic than what A and B can compute, that they are very certain this won’t happen.

First solution: They could dodge the question by just looking for different actions to take they don't disagree on. But that’s boring.

Second solution: They could aggregate their numeric credences somehow. They could get fancy on how to do this. They could even get into more detail, and aggregate parts of their deliberation that are more detailed and informative than a mere number (and that are upstream of this probability), like different heuristics or reference-class estimates they've used to come up with them. They might face some credit assignment problems (which of my heuristics where most important in setting this probability?). This is not boring, but it’s not yet what I want to discuss.

Let’s think about what these correlations actually are and where they come from. These are actually probabilistic beliefs about logical worlds. For example, A might think that in the world where they play a (that is, conditioning A’s distribution on this fact), the likelihood of C playing c is 0.9. While if they don’t, it’s 0.3. Unfortunately, only one of the two logical worlds will be actual. And so, one of these two beliefs will never be checked against any ground truth. If they end up taking athere won’t be any mathematical fact of the matter as to what would have happened if they had not.

But nonetheless, it’s not as if “real math always gets feedback, and counterfactuals never do”: after all, the still-uncertain agent doesn’t know which counterfactual will be real, and so they use the same general heuristics to think about all of them. When reality hits back on the single counterfactual that becomes actual, it is this heuristic that will be chiseled.

I think that’s the correct picture of bounded logical learning: a pile of heuristics learning through time. This is what Logical Inductors formalize.[1]

It thus becomes clear that correlations are the “running-time by-product” of using these heuristics to approximate real math. Who cares only one of the counterfactuals will come about? We are hedging our bets by applying the heuristics that were successful in the past to all counterfactuals, and hopefully something good comes out the other end!
That is, using correlations is fundamentally about generalization of past heuristics (like everything, really). This involves trusting that generalization will converge on good things. But that’s okay, we do that all the time. This also involves accepting that, in any one particular data point, the heuristic might be very wrong (but hopefully this will happen less with time).

Third solution: So it’s natural to embrace correlations as the outputs of hacky selected-for heuristics, and it’s looking like the natural way to compare correlations is by comparing these heuristics directly. This is taking the Second solution to its logical conclusion: aggregating the atomic parts of deliberation.
While A and B cannot just investigate what C does directly, they can continue running their respective heuristics on more mathematical observations (that they don’t care about). Hopefully one of the two will prove more useful: it will have a “lower loss” when its predictions are tested against many counterfactual question. And hopefully this is a good sign that the winning heuristic will probably also do well when thinking about C (that is, we are trusting generalization).

In fact, a natural way to implement this (as opposed to running through a lot of irrelevant mathematical observations every time we need a new decision) is to run our heuristics continuously (also in decisions we care about), and keep track of which work better.

Put in terms of Logical Inductors, this amounts to taking all the traders from two Inductors, selecting those that have done best (each tested on their own Inductor), and computing their aggregate bet.

This still leaves something to improve, because the scores of each trader don't include how they would interact with those in the other Inductor. Maybe it would become clear that some traders only have a high score because all the other traders in their Inductor are even dumber.

So it would be even better (and this is another, more expensive way of "scoring the different heuristics") to just run a single Logical Inductor, with all of those heuristics together (and, let's say, a prior over them which is the average of the priors from both Inductors), and seeing all the logical observations that any of the two Inductors had seen.
That is, instead of having both agents learn independently and then compare intuitions with a low bandwidth, you merge them from the start, and ensure the different intuitions have had high bandwidth with all other ones.

The latter is more exhaustive, but way more expensive. And the former might be in some instances a natural way to cut out a lot of computation, without losing out too much expected performance. For example, maybe each Inductor (agent) specializes in a different part of Logic (that you expect to not interact too much with what the other Inductor is doing). Then, what is lost in performance by aggregating them with low bandwidth (instead of merging them from the start) should be less.

Probably this is all pragmatically hard to do in reality, but I think philosophically it’s the best we can hope for.[2] Which amounts to trusting generalization.

It also runs into some Updateful problems already experienced by Logical Inductors: when you’ve run your heuristics for longer, they might “overfit” to some observed knowledge (that is, they update on it). And so it might seem impossible to find the sweetspot in some situations, where you still don't want to update on some basic information (c), but already want sensible-looking opinions on pretty complex correlations (a).  For example, when you would like to use a very advanced heuristic to consider counterfactuals 1 and 2, but the only way to have learned this heuristic is by also having noticed that 1 is always false.[3] This is usually presented as a problem of Updatefulness, but it might also be understandable as a failure of generalization due to overfitting.

1. ^

And, unsurprisingly, when not only learning is involved, but also exploiting, what we seem to do is Updateful Policy Selection, which is nothing more than an "Action Inductor".

2. ^

Of course I have some small credence on an objective criterion existing, similarly to how I have some small credence on an objective metric for decision theories existing that we've overlooked. I just think it’s pretty obvious that’s not how philosophy has shaped up.

3. ^

Vacuously, there does always exist some Inductor with a prior weird enough to learn the useful heuristic (or have any opinions about the counterfactual that you want it to have) without learning 1 is false. But this amounts to "already knowing what you're looking for" (and you'd have to go over a lot of Inductors to find this one, thus updating a on a lot of math yourself, etc.), which is not really what you wanted the Inductor for in the first place. You wanted it (with its arbitrary simplicity prior over traders) as a reliable way of noticing patterns in reality that seem like your best chance at prediction.

# Ω 8

New Comment

I think it would be helpful to have a worked example here -- say, the twin PD in which both players are close but not identical copies, and they are initially unsure about whether or not they are correlated (one thinks they probably are, another thinks they probably aren't) but they want to think and reflect more before making up their minds. (Case 2: As above except that they both begin thinking that they probably are.) Is this the sort of thing you ar imagining?

nning through a lot of irrelevant mathematical observations every time we need a new decision) is to run our heuristics continuously (also in decisions we care about), and keep track of which work better.

Put in terms of Logical Inductors, this amounts to taking all the traders from two Inductors, selecting those that have done best (each tested on their own Inductor), and computing their aggregate bet.

Uh oh, this is starting to sound like Oesterheld's Decision Markets stuff.

I think it would be helpful to have a worked example here -- say, the twin PD

As in my A B C example, I was thinking of the simpler case in which two agents disagree about their joint correlation to a third. If the disagreement happens between two sides of a twin PD, then they care about slightly different questions (how likely A is to Cooperate if B Cooperates, and how likely B is to Cooperate if A Cooperates), instead of the same question. And this presents complications in exposition. Although, if they wanted to, they could still share their heuristics, etc.

To be clear, I didn't provide a complete specification of "what action a and action c are" (which game they are playing), just because it seemed to distract from the topic. That is, the relevant part is their having different beliefs on any correlation, not its contents.

Uh oh, this is starting to sound like Oesterheld's Decision Markets stuff.

Yes! But only because I'm directly thinking of Logical Inductors, which are the same for epistemics. Better said, Caspar throws everything (epistemics and decision-making) into the traders, and here I am still using Inductors, which only throw epistemics into the traders.

My point is:
"In our heads, we do logical learning by a process similar to Inductors. To resolve disagreements about correlations, we can merge our Inductors in different ways. Some are lower-bandwidth and frugal, while others are higher-bandwidth and expensive."
Exactly analogous points could be made about our decision-making (instead of beliefs), thus the analogy would be to Decision Markets instead of Logical Inductors.

Choosing an action is not a good way of exerting acausal influence on computations that aren't already paying attention to you in particular. When agent A wants to influence computation C, there is some other computation D that C might be paying attention to, and A is free to also start paying attention to it by allowing D to influence A's actions. This lets A create an incentive for D to act in particular ways, by channeling D's decisions into the consequences of A's actions that were arranged to depend on D's decisions in a way visible to D. As a result, D gains influence over both A and C, and A becomes coordinated with C through both of them being influenced by D (here D plays the role of an adjudicator/contract between them). So correlations are not set a priori, setting them up should be part of how acausal influence is routed by decisions.

A priori, there could exist the danger that, by thinking more, they would unexpectedly learn the actual output of C. This would make the trade no longer possible, since then taking a would give them no additional evidence about whether c happens.

If A's instrumental aim is to influence some D (a contract between A and C), what matters is D's state of logical uncertainty about A and C (and about the way they depend on D), which is the basis for D's decisions that affect C. A's state of logical uncertainty about C is less directly relevant. So even if A gets to learn C's outcome, that shouldn't be a problem. Merely observing some fact doesn't rule out that the observation took place in an impossible situation, so observing some outcome of C (from a situation of unclear actuality) doesn't mean that the actual outcome is as observed. And if D is uncertain about actuality of that situation, it might be paying attention to what A does there, and how what A does there depends on D's decisions. So A shouldn't give up just because according to its state of knowledge, the influence of its actions is gone, since it still has influence over the way its actions depend on others' decisions, according to others' states of knowledge.