Bruno De Finetti once argued that probabilities are subjective representations of forecasts of real events. In particular, he rejected the idea that statements like "there is a random probability  of this coin landing on heads when it is tossed, distributed according to ". For De Finetti, probabilities were subjective and had to refer to real events. He famously proved a representation theorem: given a subjective probability that is symmetric to permutation, we can prove the existence a random variable that represents to the "probability of each event". This theorem justified a common but metaphysically mysterious construction - random variables that represent "the probability of something" - by showing that they can be derived from random variables that represent actual events, if those random variables are exchangeable. This concern is similar (but not identical) to Jaynes' concern with the mind projection fallacy - roughly, if probabilities are subjective things, we should be careful not to start treating them as objective things.

Counterfactuals, I think, have some similarities to "probabilities of coins". In particular, I have about as much difficulty understanding what "the probability this coin lands on heads" actually means as I do understanding "the counterfactual outcome of xxx is yyy" means.

Chris says counterfactuals "only make sense from within the counterfactual perspective". One interpretation of this is: we want a theory of what makes a counterfactual statement true or useful or predictive, and such a theory can only be constructed starting with the assumption that some other counterfactual statements are true.

I think one disanalogy between probability models and counterfactuals is that there are many reasonable attempts to propose abstract and widely applicable principles for constructing good probability models. Some examples of this are principles of coherence imply that subjective uncertainty should be represented by probability models and Solomonoff or max entropy priors.

On the other hand, there are few abstract and widely applicable theories of what makes a counterfactual hypothesis appropriate. Commonly, we rely on specific counterfactual intuitions: for example, randomization guarantees that potential outcomes are independent of treatment assignment. As I understand, David Lewis' theory is one exception to this rule - it posits that we can evaluate the appropriateness of a counterfactual hypothesis by reference to a measure of "how similar" different "possible worlds" are. I am not an expert in Lewis' work, but this attempt does not appear to me to have borne a lot of fruit.

I'm going to propose a theory of counterfactuals that says counterfactual hypotheses should be evaluated by reference to a set of similar agents. Like Lewis' theory, this requires that we have some means of determining which agents are similar, and my theory has some serious gaps in this regard. It is somewhat similar to De Finetti's strategy in that we take something which seems impossible to define for a single event (counterfactual hypotheses and probabilities of probabilities respectively) and instead define them with regard to sequences of events that are symmetric in an appropriate sense.

I'm going to say that an "counterfactual hypothesis" is a function  from a set of actions A to a set of consequences . An agent equipped with such a function where  matches the actions they actually have available could use it to evaluate the prospects of each action (which may or may not be a wise choice). If we translate the evaluation process to English, it might consist of statements like "if I do , the result will be ". This is "counterfactual" in the sense that the antecedent is in fact true only for one  (whether I know it or not) and false otherwise. Despite this, we want it to somehow say something meaningful for every , so I can use it to choose good actions instead of bad ones.

If you have read a lot of Pearl's work, you might object that these things shouldn't actually be called counterfactual because we reserve that term for things like  except where we already know which action  we actually too and which consequence  we actually observed. My response is: the two things are obviously quite similar, and I think it's reasonable to start with a simpler problem.

I avoid probability as much as I can, because it makes things more complicated. The result is I make a bunch of completely unreasonable definitions about things being deterministically equal. I'm sorry.

Proposition: counterfactuals can be defined with respect to ensembles of peers

An agent is something that takes one action and experiences one consequence. We don't dwell too much about what actions and consequences are; we'll just consider them to be random variables. So we can think of an agent as just a pair  of an action and a consequence. 

A set of peers is a set of agents such that all agents that take the same action experience the same consequence.

We can the define a counterfactual function :

(D1): a counterfactual  appropriate for agent  is the map from actions to consequences defined by a set of peers of 

An agent reasoning using this counterfactual function might say in English: "if I or an agent essentially identical to me does , the result will be ."

D1 is vague and nonunique. This seems problematic, but maybe it can still be useful. Even with the nonuniqueness, it may in some circumstances have nontrivial consequences. Consider the assumption

(A1):  has a set of peers 

D1 + A1 implies that agent  must expect consequences that some other agent has experienced or will experience. D1 + A1 is unreasonably strong, but we might imagine more reasonable versions if we can come up with good notions of approximate peers, and approximately the same consequences.

Furthermore, given the consequences experienced by an agent in  for each of its available actions, then given whatever action agent  takes we can say which consequence it will experience.

So far, this doesn't seem too far from a dumb way of defining arbitrary functions  which, for whatever reason, is restricted to functions that have already been realised by some collection of agents.

It becomes non-trivial if we suppose that it is possible to know some set of agents is a set of peers before it is possible to know .

Here's a sketch of how this could proceed:

We have a small set  of peers of  known a priori, and we observe that on all the actions taken by agents in , the consequence are the same as those experienced by the larger set of agents , with each  having its own small set  of a priori peers. Then we take the union of all s and posit that  is also a peer set for . Importantly,  has some actions that aren't in the original , so we get some extrapolation to unseen actions.

 could be, for example, a collection of actions all taken by agent , where we have unusally strong reasons to believe the consequences are consistent.

In reality, we would want to deal with stochastic consequences, and some kind of regularisation on allowable candidates for  (i.e. if we iterate through the power set of agents, we'll end up over-fitting). This is just a toy theory, and I don't know if it works or not.

Aside: evolution and counterfactuals

Chris makes the somewhat offhanded remark

However, it would be inaccurate to present [counterfactuals] as purely a human invention as we were shaped by evolution in such a way as to ground these conceptions in reality.

If we there is some weaker notion of peer which corresponds roughly with an agent's competitors in an evolutionary environment, we divide agents into generations and the consequence an agent in generation  experiences is the number of agents that make choices the same way it does in generation , then agents that make choices that maximise this number according to D1-like counterfactuals will be represented more in the next generation. 

If there exists a unique procedure that, in all generations, learns a D1-like counterfactual and makes choices maximising descendents according to the learned rule, then agents implementing this procedure will become the most numerous.

Decision theoretic paradoxes

Decision theoretic paradoxes can be understood to be asking: which evaluative counterfactuals are appropriate in this situation? A1 and D1 don't really tell us this in the classic paradoxes. Consider the following version of Newcomb's problem:

  • We've just watched big mobs of people pick, some chose one box and some chose two box
  • One-boxers always got $1m, two-boxers always got $100
  • We know the predictions were made before seeing the actions, and the rules were followed (both boxes filled if one box predicted, $1m box empty otherwise)

If we accept D1, then the problem is one of identifying a class of peers whose choices and consequences we can use to evaluate our own prospects. Some options are:

  1. None of them are peers (reject A1)
  2. If we one box then one-boxers are peers, if we two box then some two-boxers are
  3. One-boxers and two-boxers are peers
  4. Two-boxers are peers but not one-boxers
  5. Option 4 with one-box and two-box reversed

Options 1 leaves the appropriate counterfactual underdetermined - we do not see the consequences for one or both actions for any class of essentially identical agents. Option 3 implies one-boxing.

Can we say anything about which options are more reasonable, and is thinking about it at all enlightening?

Option 1 seems excessively skeptical to me. Appealing to the idea of learning peers: usually whatever consequences I experience for taking an action, there's someone else who's tried it first and had the same experience.

Option 2 is equivalent to option 3. Why? Option 2 says: if we one box, we get the same consequence as the one-box subset ($1m) and if we two box we get the same consequence as the two-box subset ($100). By definition of peers they're all our peers. 

What about accepting option 4 and then going on to choose one box? It would be quite surprising if we learned that agents that tend to take the opposite action to us are the ones that get the same results.

So it seems like the reasonable advice, acccepting D1, is to one-box. Does this mean D1 is equivalent to EDT? I think EDT broadly defined could be many things - who even knows what "the conditional probability" is, anyway? However, maybe this could be associated with a kind of frequentist EDT that defines probabilities via relative frequencies in populations of peers (maybe).

Mundane causation vs correlation

Does this definition handle mundane cases of causation and correlation? It seems like it might. Suppose we're a doctor in a clinic seeing a patient who is sick and we're deciding whether to make a prescription. We've seen a large collection  of other doctor-patient data from regular consultations (i.e. non-randomised).

The reasoning for Newcomb's problem can be applied analogously to suggest that if 100% of patients who received the prescription recover and 100% of those who do not don't recover, we should probably prescribe. I suspect that many would accept this in practice, too. The cases where it goes wrong seem to be some flavour of "everyone is trying to trick me", and if that is plausible then the excessively skeptical option 1 from the Newcomb's example seems somewhat reasonable in response.

What if, say, 75% of the treated recover and 25% of the untreated do with 50% treated and 50% not? One way you can approach this is to notice that a class of peers is formally identical to a class of agents with the same potential outcome function, so you can basically do whatever potential outcomes practitioners do.

If peers are learnable, we might be able to say something positive about which bits of data form an appropriate comparison set. Perhaps this gets us to some special case of data fusion.


The notion of defining counterfactuals via "actually existing ensembles of peers" has a lot of the same problems regular counterfactuals have. In our toy theory, there's a peer set that contains any two agents that took different actions, which is similar to the problem with naive logical counterfactuals that, when given a false premise, tell us that any consequence is possible. I think this the central difficulty: coming up with a notion of "peer" that allows some agents that take different actions to be peers, but also doesn't allow any agents that take different actions to be peers.

The difficulties with peer counterfactuals aren't quite identical to naive logical counterfactuals, though: peer counterfactuals are restricted to consequences that have already been realised. I suppose the obvious strengthening of this idea is -peer counterfactuals, where permissible consequences must be experienced in at least  of all cases, with . This translates to the idea "I'll get the same results as everyone else who tried the same thing". This is actually a more precise statement of what people are usually rejecting when they say  "causation  correlation".

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 7:19 PM

Replacing talk of counterfactuals with talk of peers seems like a natural way to attempt to eliminate circularity.

I guess the question that immediately that spring to mind for me is how to respond to the temptation to shift from utilising actual peers to potential peers which then seems to reraise the specter of circularity. For example, peers are generally not going to be facing exactly the same situation. Let's suppose I face Newcomb's problem in a yellow shirt and you face it in a red shirt. They ought to be comparable because my shirt color doesn't matter. And I guess the temptation is to say, "Alright, the situation aren't exactly the same, but let's just pretend that I was wearing a red shirt instead, it doesn't make a difference", however this approach would produce circularity.

I actually think it's quite plausible that this objection can be overcome, so I don't want to say this is the strongest criticism I could make, when it's just the first thing that jumped to mind.

how to respond to the temptation to shift from utilising actual peers to potential peers which then seems to reraise the specter of circularity.

I think you might be able to say something like "actual peers is why the rule was learned, virtual peers is because the rule was learned".

(Just to be clear: I'm far from convinced that this is an actually good theory of counterfactuals, it's just that it also doesn't seem to be obviously terrible)

Let's suppose I face Newcomb's problem in a yellow shirt and you face it in a red shirt. They ought to be comparable because my shirt color doesn't matter.

The definition of peers given is in terms of what actually happens, and the question of whether you think someone is your peer is put down to "induction somehow". I think this approach has serious problems, but it does answer your question: whether we are peers depends on the results we get. How we should include the colour of our shirts in our decision making on the other hand is a question of what inductive assumptions we're willing to make when assessing likely peers.

Actually, there's an important asymmetry in terms of inductive assumptions in just about every situation that involves "I see you do x, then I do x". The thing is, I almost always know more about why I am doing x than I do about why you are doing x. You might be two boxing because you signed a contract with the predictor beforehand where you agreed to two-box in exchange for an under the table payment of $2m, while I am two-boxing because I decided the dominance theory is compelling. I don't know why you act and you don't know why I act, but I know why I act and you know why you act.

Thus the case for epistemic indifference between "me about to choose, from my perspective" and "you about to choose, from my perspective" seems to be quite compromised, before we even consider what shirts we are wearing. And this is as it should be! Inferring causation from correlation is usually a bad move. Nonetheless, it seems unreasonable to think that I am so different to everyone else that I should two box in Newcomb's problem w/a predictor that has a perfect track record.

The notion that "some of these people were in the same position I am in, though I don't know who they are" seems pretty plausible, though I feel like it's resting on some more fundamental assumptions that I'm not quite sure of right now.

I think this approach has serious problems, but it does answer your question: whether we are peers depends on the results we get.

Just to see I'm following correctly, you're a peer if you obtain the same result in the same situation?

My point about yellow shirts and red shirts is that it isn't immediately obvious what counts as the same situation. For example, if the problem involved Omega treating you differently by shirt color, then it would seem like we were in different situations. Maybe your response would be, you got a different result so you're not a peer, no need to say it's a different situation. I guess I would then question if someone one-boxing in Newcomb's and someone opening a random box and finding $1 million would be peers just b/c they obtained the same result. I guess completely ignoring the situation would make the class of peers too wide.

Peers get the same results from the same actions. It's not exactly clear what "same action" or "same result" means -- is "one boxing on the 100th run" the same as "one boxing on the 101st run" or "box 100 with $1m in it" the same as "box 101 with $1m in it"? I think we should think of peers as being defined with respect to a particular choice of variables representing actions and results.

I think the definitions of these things aren't immediately obvious, but it seems like we might be able to figure them out sometimes. Given a decision problem, it seems to me that "the things that I can do" and "the things that I care about" might often be known to me. It seems to be the case that I can also define some variables that represent copies of these things from your point of view, although it's a bit less obvious how to do that.

If we think about predictions vs outcomes, we judge good predictions to be ones that have a good match to outcomes. Similarly, a "peer inference" is a bit like a prediction -- I think this group of action-outcome pairs will be similar to my own -- and the outcome that can be assessed at the end is whether they actually are similar to my own action and outcome. I can't assess whether they "would have been" peers "had I taken a different action", but maybe I don't need to. For example: if I assess some group of people to be my peers relative to a particular decision problem, and all the people who take action 1 end up winners while all the people who take action 2 end up losers, and I take action 1 and end up a winner then I have done well relative to the group of people I assessed to be my peers, regardless of "what would have happened had I taken action 2".

There is a sense in which I feel like peers ought to also be a group that it is relevant to judge myself against - I want to take actions that do better than my peers, in some sense. Maybe defining actions and outcomes addresses this concern? I'm not sure.

I think a substantial problem with this theory is the fact that I may often find that the group of peers for some problem contains only me, which leaves us without a useful definition for a counterfactual.