Followup to: Causality: The Fabric of Real Things

Previous meditation:

"You say that a universe is a connected fabric of causes and effects. Well, that's a very Western viewpoint - that it's all about mechanistic, deterministic stuff. I agree that anything else is outside the realm of science, but it can still be real, you know. My cousin is psychic - if you draw a card from his deck of cards, he can tell you the name of your card before he looks at it. There's no mechanism for it - it's not a causal thing that scientists could study - he just does it. Same thing when I commune on a deep level with the entire universe in order to realize that my partner truly loves me. I agree that purely spiritual phenomena are outside the realm of causal processes that can be studied by experiments, but I don't agree that they can't be real."


Fundamentally, a causal model is a way of factorizing our uncertainty about the universe.  One way of viewing a causal model is as a structure of deterministic functions plus uncorrelated sources of background uncertainty.

Let's use the Obesity-Exercise-Internet model (reminder: which is totally made up) as an example again:

We can also view this as a set of deterministic functions Fi, plus uncorrelated background sources of uncertainty Ui:

This says is that the value x3 - how much someone exercises - is a function of how obese they are (x1), how much time they spend on the Internet (x2), plus some other background factors U3 which don't correlate to anything else in the diagram, all of which collectively determine, when combined by the mechanism F3, how much time someone spends exercising.

There might be any number of different real factors involved in the possible states of U3 - like whether someone has a personal taste for jogging, whether they've ever been to a trampoline park and liked it, whether they have some gene that affects exercise endorphins. These are all different unknown background facts about a person, which might affect whether or not they exercise, above and beyond obesity and Internet use.

But from the perspective of somebody building a causal model, so long as we don't have anything else in our causal graph that correlates with these factors, we can sum them up into a single factor of subjective uncertainty, our uncertainty U3 about all the other things that might add up to a force for or against exercising. Once we know that someone isn't overweight and that they spend a lot of time on the Internet, all our uncertainty about those other background factors gets summed up with those two known factors and turned into a 38% conditional probability that the person exercises frequently.

And the key condition on a causal graph is that if you've properly described your beliefs about the connective mechanisms Fi, all your remaining uncertainty Ui should be conditionally independent:

or more generally

And then plugging those probable Ui into the strictly deterministic Fi should give us back out our whole causal model - the same joint probability table over the observable Xi.

Hence the idea that a causal model factorizes uncertainty. It factorizes out all the mechanisms that we believe connect variables, and all remaining uncertainty should be uncorrelated so far as we know.

To put it another way, if we ourselves knew about a correlation between two Ui that wasn't in the causal model, our own expectations for the joint probability table couldn't match the model's product

and all the theorems about causal inference would go out the window. Technically, the idea that the Ui are uncorrelated is known as the causal Markov condition.

What if you realize that two variables actually are correlated more than you thought?  What if, to make the diagram correspond to reality, you'd have to hack it to make some Ua and Ub correlated?

Then you draw another arrow from Xa to Xb, or from Xb to Xa; or you make a new node representing the correlated part of Ua and Ub, Xc, and draw arrows from Xc to Xa and Xb.

 vs.  vs. 

(Or you might have to draw some extra causal arrows somewhere else; but those three changes are the ones that would solve the problem most directly.)

There was apparently at one point - I'm not sure if it's still going on or not - this big debate about the true meaning of randomization in experiments, and what counts as 'truly random'. Is your randomized experiment invalidated, if you use a merely pseudo-random algorithm instead of a thermal noise generator? Is it okay to use pseudo-random algorithms? Is it okay to use shoddy pseudo-randomness that a professional cryptographer would sneer at? Clearly, using 1-0-1-0-1-0 on a list of patients in alphabetical order isn't random enough... or is it? What if you pair off patients in alphabetical order, and flip a coin to assign one member of each pair to the experimental group and the control? How random is random?

Understanding that causal models factorize uncertainty leads to the realization that "randomizing" an experimental variable means using randomness, a Ux for the assignment, which doesn't correlate with your uncertainty about any other Ui. Our uncertainty about a thermal noise generator seems strongly guaranteed to be uncorrelated with our uncertainty about a subject's economic status, their upbringing, or anything else in the universe that might affect how they react to Drug A...

...unless somebody wrote down the output of the thermal noise generator, and then used it in another experiment on the same group of subjects to test Drug B. It doesn't matter how "intrinsically random" that output was - whether it was the XOR of a thermal noise source, a quantum noise source, a human being's so-called free will, and the world's strongest cryptographic algorithm - once it ends up correlated to any other uncertain background factor, any other Ui, you've invalidated the randomization.  That's the implicit problem in the XKCD cartoon above.

But picking a strong randomness source, and using the output only once, is a pretty solid guarantee this won't happen.

Unless, ya know, you start out with a list of subjects sorted by income, and the randomness source randomly happens to put out 111111000000. Whereupon, as soon as you look at the output and are no longer uncertain about it, you might expect correlation and trouble. But that's a different and much thornier issue in Bayesianism vs. frequentism.

If we take frequentist ideas about randomization at face value, then the key requirement for theorems about experimental randomization to be applicable, is for your uncertainty about patient randomization to not correlate with any other background facts about the patients. A double-blinded study (where the doctors don't know patient status) ensures that patient status doesn't correlate with the doctor's beliefs about a patient leading them to treat patients differently. Even plugging in the fixed string "1010101010" would be sufficiently random if that pattern wasn't correlated to anything important; the trouble is that such a simple pattern could very easily correlate with some background effect, and we can believe in this possible correlation even if we're not sure what the exact correlation would be.

(It's worth noting that the Center for Applied Rationality ran the June minicamp experiment using a standard but unusual statistical method of sorting applicants into pairs that seemed of roughly matched prior ability / prior expected outcome, and then flipping a coin to pick one member of each pair to be admitted or not admitted that year.  This procedure means you never randomly improbably get an experimental group that would, once you actually looked at the random numbers, seem much more promising or much worse than the control group in advance - where the frequentist guarantee that you used an experimental procedure where this usually doesn't happen 'in the long run', might be cold comfort if it obviously had happened this time once you looked at the random numbers.  Roughly, this choice reflects a difference between frequentist ideas about procedures that make it hard for scientists to obtain results unless their theories are true, and then not caring about the actual random numbers so long as it's still hard to get fake results on average; versus a Bayesian goal of trying to get the maximum evidence out of the update we'll actually have to perform after looking at the results, including how the random numbers turned out on this particular occasion.  Note that frequentist ethics are still being obeyed - you can't game the expected statistical significance of experimental vs. control results by picking bad pairs, so long as the coinflips themselves are fair!)

Okay, let's look at that meditation again:

"You say that a universe is a connected fabric of causes and effects. Well, that's a very Western viewpoint - that it's all about mechanistic, deterministic stuff. I agree that anything else is outside the realm of science, but it can still be real, you know. My cousin is psychic - if you draw a card from his deck of cards, he can tell you the name of your card before he looks at it. There's no mechanism for it - it's not a causal thing that scientists could study - he just does it. Same thing when I commune on a deep level with the entire universe in order to realize that my partner truly loves me. I agree that purely spiritual phenomena are outside the realm of causal processes that can be studied by experiments, but I don't agree that they can't be real."

Well, you know, you can stand there all day, shouting all you like about how something is outside the realm of science, but if a picture of the world has this...

...then we're either going to draw an arrow from the top card to the prediction; an arrow from the prediction to the top card (the prediction makes it happen!); or arrows from a third source to both of them (aliens are picking the top card and using telepathy on your cousin... or something; there's no rule you have to label your nodes).

More generally, for me to expect your beliefs to correlate with reality, I have to either think that reality is the cause of your beliefs, expect your beliefs to alter reality, or believe that some third factor is influencing both of them.

This is the more general argument that "To draw an accurate map of a city, you have to open the blinds and look out the window and draw lines on paper corresponding to what you see; sitting in your living-room with the blinds closed, making stuff up, isn't going to work."

Correlation requires causal interaction; and expecting beliefs to be true means expecting the map to correlate with the territory. To open your eyes and look at your shoelaces is to let those shoelaces have a causal effect on your brain - in general, looking at something, gaining information about it, requires letting it causally affect you. Learning about X means letting your brain's state be causally determined by X's state. The first thing that happens is that your shoelace is untied; the next thing that happens is that the shoelace interacts with your brain, via light and eyes and the visual cortex, in a way that makes your brain believe your shoelace is untied.

p(Shoelace=tied, Belief="tied") 0.931
p(Shoelace=tied, Belief="untied") 0.003
p(Shoelace=untied, Belief="untied") 0.053
p(Shoelace=untied, Belief="tied") 0.012

This is related in spirit to the idea seen earlier on LW that having knowledge materialize from nowhere directly violates the second law of thermodynamics because mutual information counts as thermodynamic negentropy. But the causal form of the proof is much deeper and more general. It applies even in universes like Conway's Game of Life where there's no equivalent of the second law of thermodynamics. It applies even if we're in the Matrix and the aliens can violate physics at will. Even when entropy can go down, you still can't learn about things without being causally connected to them.

The fundamental question of rationality, "What do you think you know and how do you think you know it?", is on its strictest level a request for a causal model of how you think your brain ended up mirroring reality - the causal process which accounts for this supposed correlation.

You might not think that this would be a useful question to ask - that when your brain has an irrational belief, it would automatically have irrational beliefs about process.

But "the human brain is not illogically omniscient", we might say. When our brain undergoes motivated cognition or other fallacies, it often ends up strongly believing in X, without the unconscious rationalization process having been sophisticated enough to also invent a causal story explaining how we know X. "How could you possibly know that, even if it was true?" is a more skeptical form of the same question. If you can successfully stop your brain from rationalizing-on-the-spot, there actually is this useful thing you can sometimes catch yourself in, wherein you go, "Oh, wait, even if I'm in a world where AI does get developed on March 4th, 2029, there's no lawful story which could account for me knowing that in advance - there must've been some other pressure on my brain to produce that belief."

Since it illustrates an important general point, I shall now take a moment to remark on the idea that science is merely one magisterium, and there's other magisteria which can't be subjected to standards of mere evidence, because they are special. That seeing a ghost, or knowing something because God spoke to you in your heart, is an exception to the ordinary laws of epistemology.

That exception would be convenient for the speaker, perhaps. But causality is more general than that; it is not excepted by such hypotheses. "I saw a ghost", "I mysteriously sensed a ghost", "God spoke to me in my heart" - there's no difficulty drawing those causal diagrams.

The methods of science - even sophisticated methods like the conditions for randomizing a trial - aren't just about atoms, or quantum fields.

They're about stuff that makes stuff happen, and happens because of other stuff.

In this world there are well-paid professional marketers, including philosophical and theological marketers, who have thousands of hours of practice convincing customers that their beliefs are beyond the reach of science. But those marketers don't know about causal models. They may know about - know how to lie persuasively relative to - the epistemology used by a Traditional Rationalist, but that's crude by the standards of today's rationality-with-math. Highly Advanced Epistemology hasn't diffused far enough for there to be explicit anti-epistemology against it.

And so we shouldn't expect to find anyone with a background story which would justify evading science's skeptical gaze. As a matter of cognitive science, it seems extremely likely that the human brain natively represents something like causal structure - that this native representation is how your own brain knows that "If the radio says there was an earthquake, it's less likely that your burglar alarm going off implies a burglar." People who want to evade the gaze of science haven't read Judea Pearl's book; they don't know enough about formal causality to not automatically reason this way about things they claim are in separate magisteria. They can say words like "It's not mechanistic", but they don't have the mathematical fluency it would take to deliberately design a system outside Judea Pearl's box.

So in all probability, when somebody says, "I communed holistically and in a purely spiritual fashion with the entire universe - that's how I know my partner loves me, not because of any mechanism", their brain is just representing something like this:

Partner loves Universe knows I hear universe %
p u h 0.44
p u ¬h 0.023
p ¬u h 0.01
p ¬u ¬h 0.025
¬p u h 0.43
¬p u ¬h 0.023
¬p ¬u h 0.015
¬p ¬u ¬h 0.035

True, false, or meaningless, this belief isn't beyond investigation by standard rationality.

Because causality isn't a word for a special, restricted domain that scientists study. 'Causal process' sounds like an impressive formal word that would be used by people in lab coats with doctorates, but that's not what it means.

'Cause and effect' just means "stuff that makes stuff happen and happens because of other stuff". Any time there's a noun, a verb, and a subject, there's causality. If the universe spoke to you in your heart - then the universe would be making stuff happen inside your heart! All the standard theorems would still apply.

Whatever people try to imagine that science supposedly can't analyze, it just ends up as more "stuff that makes stuff happen and happens because of other stuff".

 Mainstream status.

Part of the sequence Highly Advanced Epistemology 101 for Beginners

Next post: "Causal Reference"

Previous post: "Causal Diagrams and Causal Models"

New Comment
128 comments, sorted by Click to highlight new comments since:
Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

p(a,b,c) = p(a)p(b)p(c) isn't a statement of uncorrelatedness but of independence. Using the term "uncorrelated" with that meaning might be defensible but probably merits mention as something not-mainstream.

It's helpful to go a bit further for these corrections. What's the reason not to use "uncorrelated" here? In ordinary English, "uncorrelated" is indeed used for this (and a host of other things, because ordinary English is very vague). The problem is that it means something else in probability theory, namely the much weaker statement E(a-E(a)) E(b-E(b)) = E((a-E(a)(b-E(b)), which is implied by independence (p(a,b) = p(a)p(b)), but not does not imply independence. If we want to speak to those who know some probability theory, this clash of meaning is a problem. If we want to educate those who don't know probability theory to understand the literature and be able to talk with those who do know probability theory, this is also a problem. (Note too that uncorrelatedness is only invariant under affine remappings (X and Y chosen as the coordinates of a random point on the unit circle are uncorrelated. X^2 and Y^2 are perfectly correlated. Nor does correlated directly make any sense for non-numerical variables (though you could probably lift to the simplex and use homogeneous coordinates to get a reasonable meaning).)
I know that Eliezer knows quite a lot of mathematics. His article was clearly written for people who are at least a bit comfortable with mathematics. So it's reasonable to suppose (1) that a substantial fraction of readers will have encountered something like the mathematical notion of "uncorrelated" and might therefore be confused by having the word used to denote something else, and (2) that in notifying Eliezer of this it's OK to be pretty terse about it. For the avoidance of doubt, I'm not disagreeing with anything you said, just explaining why I just made the brief statement I did rather than offering more explanation.
E(a-E(a)) and E(b-E(b)) are both identically zero, so this would be more simply put (and restoring some missing parentheses) as E((a-E(a))(b-E(b))) = 0. Or after shifting the means of both variables to zero, E(ab) = 0.
Don't bother, he's "write-only." ---------------------------------------- edit: There is stuff in the original 'causal diagrams' post from nearly two weeks ago that is factually wrong (not a minor nitpick either), was pointed out as such, and is still uncorrected. "Write-only."

It's worth noting that the Center for Applied Rationality ran the June minicamp experiment using a standard but unusual statistical method of sorting applicants into pairs that seemed of roughly matched prior ability / prior expected outcome, and then flipping a coin to pick one member of each pair to be admitted or not

As an aside, if you're interested in looking up more about this nifty experimental design trick, the magic keyword is "blocking". The idea of randomized block designs dates back to Fisher.

I've found blocking to be really useful for my small-scale experiments for 2 different reasons: 1. Often, I'm worried about simple randomization leading to an imbalance in sample vs experimental; if I'm only getting 20 total datapoints on something, then randomization could easily lead to something like 14 control and 6 experimental datapoints - throwing out a lot of statistical power compared to 10 control and 10 experimental. If I pair days, then I know I will get 10/10, without worrying about breaking blinding. 2. Blocking is the natural way to handle multiple-day effects or trends: if I think lithium operates slowly, I will pair entire weeks or months, rather than days and hoping enough experimental and control days form runs which will reveal any trend rather than wash it out in averaging.

Mainstream status:

As previously stated, the take on causality's math is meant to be academically standard; this includes the idea of decomposing the X(i) into deterministic F(i) and uncorrelated U(i).

I haven't particularly seen anyone else observe that claiming you know about X without X affecting you, you affecting X, or X and your belief having a common cause, violates the Markov condition on causal graphs.

I haven't actually seen anyone cite the Markov condition as a reply to the old "What constitutes randomization?" debates I've glimpsed, but I would be genuinely surprised if Pearl & co. hadn't pointed it out by now - my understanding is that he's spending most of his time evangelizing causality to experimental statisticians these days. It seems pretty obvious once you have causal models as a background.

The concept of "separate magisteria" is as old as scientific critique of religion, but the actual phrase was coined by Stephen Jay Gould (speaking favorably of the separation, natch). So far as I know, the concept of anti-epistemology is an LW original; likewise the view that causality is more general than anyone trying to separate their magisterium would ... (read more)

Isn't this essentially implied by the well-known ideas of "natural experiments" and "instrumental variables"? Pearl does deal with these ideas in Causality.

Look at it as an exercise for the actively disbelieving mini-skill. :)

Mini-trick for the mini-skill: Pretend he's talking about a fictional universe where anything explicitly mentioned is arbitrary.

Whatever people try to imagine that science supposedly can't analyze, it just ends up as more "stuff that makes stuff happen and happens because of other stuff".

I think said people would object to this - e.g. "God certainly isn't stuff, God is metaphysical!" This, of course, is not problem for causal diagrams. The math allows you to have arrows from metaphysical stuff to physical stuff, which allows you to see occam's razor visually. But it's interesting to think about how to best counter this argument when you're trying to convince your opponent and not just yourself.

Upvoted for: :)
Fun gamble: Make a huge causal diagram as part of the discussion, and once people bring up the metaphysical God argument, point at the whole diagram and say "Okay, if God is metaphysical, then he's the rules by which the diagrams operate. There, look at this diagram. You're looking at God." I doubt it'd work, but the thought made me chuckle.
I suspect the best counter would have been to have seen more steps ahead and given them some abstract causal diagram practice.

Clearly, using 1-0-1-0-1-0 on a list of patients in alphabetical order isn't random enough... or is it?

It's not, if only because the people implementing it can guess it: a textbook I read on doing medical trials mentioned that this procedure was done in medical trials, and it led to tampering where doctors would send the patients they liked better or were sicker or whatever to the 'right' trial arm.

So they changed the person's name, or what?

Something like that. There are a lot of ways to tamper with this: participation is voluntary, of course, so if a patient would 'benefit' from being randomized to the 'right' arm, you'd encourage them to do it, while if they weren't, you'd encourage them to drop out (and maybe get the tested treatment themselves!). You'd filter the list in the first place, or use alternative names (My legal first name starts with one letter but I always go by a version of my middle name which starts with a different letter: which version does the doctor write down?). And so on.

One interesting example, from a retrospective:

It took only a few months to accumulate the required experience in the two hospitals (Reese et al. 1952). Allocation to ACTH or no ACTH was decided by drawing marbles from a jar containing an equal number of white and blue marbles: one morning, when a new infant became eligible for enrollment, I noticed that our head nurse shook the jar vigorously, turned her head away, pulled a marble out (just as she had been instructed); but because she didn't like the 'assignment', she put the marble back, shook the jar again, and pulled out the color that agreed with her bias! The importance of Bradford Hill's precaution in Britain's famous streptomycin trial to conceal the order of assignment in sealed envelopes was immediately obvious!


Twice in this article, there are tables of numbers. They're clearly made-up, not measured from experiment, but I don't really understand exactly how made-up they are - are they carefully or casually cooked?

Could people instead use letters (variables), with relations like 'a > b', 'a >> b', 'a/b > c' and so on in the future? Then I could understand better what properties of the table are intentional.

In my experience, using variables instead of numbers when it's not absolutely necessary makes things ridiculously harder to understand for someone not comfortable with abstract math.
we are talking about the mathematics of causality. I would expect people to be familiar with free variables and algebra. I for one would find explicit algebraic expessions much clearer than a bunch of meaningless numbers.
Depends what you mean by "familiar". I'd imagine anyone reading the essay can do algebra, but that they're still likely to be more comfortable when presented with specific numbers. People are weird like that - we can learn general principles from examples more easily than from having the general principles explained to us explicitly. Exceptions abound, obviously.
There's nothing about the tables that was not explained in the previous installment of this series; click the links if you're still confused. I came to this knowing nothing about that type of notation, but the tables told me even more than the bubble diagrams--and here's the secret. Looking at the table tells you next to nothing. It's only when you think about the situations that the probabilities quantify, then they make sense. Although, as an additional step, he could have explained each of the situations in sentence form in a paragraph, but probably felt the table spoke for itself. The second table, for instance, (if I am interpreting correctly) can be paraphrased as: I believe that my partner loves me, and that the universe knows it, and I can get this answer from the universe. I would also know that if my partner didn't love me, because the universe would know it and I would hear that. It's probably one of those two. Of course it could be that I don't hear the universe, or the universe is lying to me, or that the universe doesn't magically pick up our thoughts (how unromantic!), but I really don't believe that to be true, I only admit that it's possible. I am rational, after all.
I agree that if you don't look at the numbers, but at the surrounding text, you get the sense that the numbers could be paraphrased in that way. So does h, labeled "I hear universe" mean "I hear the universe tell me something at all", or "I hear the universe tell me that they love me" or "I hear the universe tell me what it knows, which (tacitly according to the meaning of knows) is accurate"? I thought it meant "I have a sensation as if the universe were telling me that they love me", but the highest probability scenarios are p&u&h, and -p&u&h, which would suggest that regardless of their love, I'm likely to experience a sensation as if the universe were telling me that they love me. That seems reasonable from a skeptical viewpoint, but not from a believer's viewpoint.
Congratulations, you've cleared the hidden test of making sure that this isn't all just a password in your head! IMO, which one it was intended to be is irrelevant as long as you understand both cases. Understanding these things enough to be able to untangle them like this sounds like it's really the whole point of the article.
I took h to mean "I accurately receive the information that the universe conveys", which in this case regarding the state of my partner loving me or not, I would still accurately hear the universe, otherwise it would be not-h. Since I am considering possible states, partner-not-loving-me/universe-tells-me/me-hearing-that would be the second most likely possibility, because the other two variables are less in doubt (for the person in the example). If this person were in real life, they probably are frustrated, wondering why on earth it feels like their partner is trying to drive a wedge in the relationship, when obviously they are in love, because the universe can magically read their minds and the crystal auras never lie.

Commenter HistoricalLing does have a point. Katsuki Sekida explains:

"Now, 'Mu' means 'nothing' and is the first koan in Zen. You might suppose that, as you sit saying 'Mu', you are investigating the meaning of nothingness. But that is quite incorrect. It is true that your teacher, who has instructed you to work on Mu, may repeatedly say to you, 'What is Mu?' 'Show me Mu,' and so on, but he is not asking you to indulge in conceptual speculation. He wants you to experience Mu. And in order to do this, technically speaking, you have to take Mu simply as ... (read more)

[This comment is no longer endorsed by its author]Reply
7Eliezer Yudkowsky
Suggest a better word? Keep in mind that words which are not better will be rejected (people often seem to forget this while making alternate suggestions).

I think the division into problems and exercises usually seen in mathematics texts would be useful: A task is considered an exercise if it's routine application of previous material, it's a problem if it requires some kind of insight or originality. So far most of the Koans have seemed more like problems than like exercises, but depending on content both may be useful. I might be slightly biased towards this as I greatly enjoy mathematics texts and am used to that style.

"Problem" suggests something different in philosophy than in math. A philosophy "problem" is a seeming dilemma, e.g. Gettier, Newcomb's, or Trolley. So I'd suggest "exercise" here.

"Exercise" dominates "kōan" in that both have the sense of something to stop and think about and try to solve, but ① "exercise" avoids the misconstrual of Zen practice (the purpose of a Zen kōan is not to come up with a solution, nor to set up for an explanation), ② the Orientalism (the dubiosity of saying something in Japanese to make it sound 20% cooler), and ③ the distraction of having to explain what a kōan is to those who don't know the word.

EDIT: The claim that a purpose of a Zen kōan is not to come up with a solution appears to be a matter of disagreement, so discount ①. I think ② and ③ stand, though.

The account in the Wikipedia article says differently: According to the history of the word given there, it originally meant accounts of legal decisions (and literally, a magistrate's bench). In Chinese Buddhism it came to refer to snippets of dialogue between masters. From there it mutated to the contemplation of mysterious sayings, and eventually to what looks very like an exercise in guessing the teacher's password, with authorised answers that were specifically taught and had to be given to acquire promotion in the Japanese monastery system. (I have this book, which is subtitled "281 Zen Koans with Answers".) The modern meaning of "koan" dealt with in the section "Koan-practice" describes what looks very like Eliezer's intention in using the word here: a problem that cannot be answered by merely applying known rules to new examples, but requires new thoughts and ideas: a problem that begins by seeming impossible: a problem that cannot be solved without in the process learning something that one has not been taught. Perhaps there is, somewhere, a better word, but I think "koan" will be hard to beat.
7Eliezer Yudkowsky
So... the main thing I want to convey over and above "exercise" is that rather than there being a straightforward task-to-solve, you're supposed to ponder the statement and say, "What do I think of this?" A word other than "koan" which conveys this intent-to-ponder would indeed be appreciated.

What about "riddle" or "puzzle"?

The only trouble I see is that "koan" makes it totally okay to think about it for a while without finding the answer, while "puzzle" might cause people to propose solutions.
Given that most people seem likely to look at the koan and think "yeah, I could solve that if I thought about it for a while" and then move on without actually thinking about it, anything that actually gets people to think about it seems like a good thing.
The only trouble is if people then have to unthink things, which humans are notoriously bad at :P
People have already been proposing solutions to the "koans", and I don't understand why that's a bad thing.
The goal is to apply those algorithms we call "rationality" towards solving the koan, one of which involves withholding even just mentally formulating solutions as much as possible, and instead just thinking about the elements and properties of the problem properly without subjecting oneself to hack heuristics. The word puzzle is, for most people, loaded with a trained impulse to shoot the first solution-sounding thing that pops to mind so that you can see whether you get a hedon / tribal status coin for a good answer or not.
Alright. I see where you're coming from, though I doubt that "puzzle" and "koan" have as many deep connotations as you claim. Maybe the right thing to do is to actually write something to the effect of "Here is how you should be approaching these puzzles/koans"?
"Puzzle" is good because it suggests that there is a solution, whereas some "problems" don't have solutions, because they are simply confused.
However, the trained behavior of most people when facing a puzzle is to look at it for a few seconds and then throw the first good-sounding solution you can think of.
Which isn't necessarily a bad thing. Either they'll get a right answer despite throwing the first possible solution at it, or they'll widely miss the mark, in which case they might actually realize that they've learned something by the time that the right answer is demonstrated.
You have a point. My (subconscious) priors on that end are skewed towards "Never, ever throw out solutions before you've laid things out properly" because of lots and lots of little personal experiences with complete failure modes due to stopping with the first solution I found.

I don't think "noodle" is taken, yet.

6Eliezer Yudkowsky
Hm. I like the direction this points. Any similar suggestions?


8Eliezer Yudkowsky
Your suggestion has been... accepted!
You could use "udon" instead of "noodle" to make it sound foreign and mysterious ;)
The word "pabulum" (from Latin for "fodder") was once used in English to mean "food for thought". However, it (or "pablum") is now more likely to denote insipid fare. We could reclaim the original meaning—in which case these statements-to-be-pondered are "pabula".
Consider adding "straightforward" exercises for the lesser mortals, and mark the harder ones, (koans?) with stars, like the standard textbooks do.
"Pondering exercise" maybe? Interesting that "pondering" is a cognitive skill that needs exercised. The term derives from a latin term for "weight". Perhaps this can be thought of as something analogous to barbells or dumbells for epistemological strength-training.
I like the way you think. Care to elaborate?
Pondering means thinking about something in a way that makes it "heavy" or difficult for the mind to process (just as heavy objects are difficult to lift). Like the metaphorical "burden of proof," it references mental difficulty of processing ideas to physical difficulty in lifting objects. The way this happens involves increasing the complexity of your mental instantiation of an idea, thereby bringing more cognitive algorithms to bear on it. The strength-training metaphor only works if it can be contrasted with endurance-training. Otherwise it would just be a generic kind of training. Strength training involves shorter bursts of focused effort followed by a recovery period. These koans are short and intended for 5-15-minutes of focused thought, so they are probably more on that end of the spectrum than lengthier articles that describe complex concepts. Epistemological endurance training (assuming there is such a thing) would be where you use longer periods of time thinking about a problem that has a fair degree of mental effort required but not overwhelming. That would analogize to running, biking, and so forth where rather than doing the hardest thing you can do, you are doing something rather hard for a longer time.
Ooops I miscommunicated. I think the surface analogy isn't the most interesting part of this. I was more interested in what ideas you had for training epistemological ability. The burst vs endurance thing could be interesting if it could be detailed in on its own terms (ie. inside view instead of analogizing). I've been thinking a lot about rationality training recently, so anything that looks like a possible excercise really catches my attention.
So it must have been "pondering as a rationality skill" which got your attention. Sorry for misinterpreting. :) For me it's not hard to ponder. I do that naturally. But I don't always ponder exactly what I'm told to ponder, even when I have every reason to think the person who told me to ponder something knows what they're talking about and this is something that if I ponder it I will benefit from the resulting enlightenment. It's like there is something in the nature of pondering that is perverse and rebellious (at least for the way my mind works, some of the time). Perhaps a good exercise would be to deliberately ponder specific things that you aren't (yet) naturally curious about. Maybe set a timer and commit to only focus on that particular topic until the timer goes off. I wonder what an optimal time length would be? Also, what kinds of topics could/should be used for the exercise?
Whether it's better or not for your purposes is of course your call, but as I said to chaosmosis above, I resolve this tension in my own mind by understanding "koan" as you use it to mean "exercise." Then again, I also replace all of your Japanese phrases in my head with their corresponding English. I suspect this just reflects my not valuing a particular kind of myth-building very much in this context, so I just experience it as a mildly annoying distraction. If you find it valuable, by all means continue with it.
I do the same. I could find no deeper meaning in EY's use of "koan". Maybe I'm missing something. Same here, except I have to look up this annoying pseudo-Japanese in-group slang almost every time. Is using it intended as some kind of status signaling?
I don't think repurposing the word 'koan' is that terrible. We are not going to do Zen koans in this context, and I would not be surprised to find that many here are more familiar with things such as Ruby koans. Also, there is some disagreement about the meaning and use of koans - Zen (and Chan, Seon) buddhism has many flavors. Notably, historically koans (and the Chinese sayings they were based on) did not necessarily have the character you attribute to them above; they were originally just teachings passed down in the form of sayings.
The origins of the word aren't very relevant to its current meaning; almost no one on this site would have known those origins before now and so those origins don't have much influence on the way we think about the word now. The standard understanding of koans that dominates pretty much everywhere is in line with what Doriana quotes. Using the word koan is inaccurate. I think Yudkowsky is either trying to do it to associate feelings of mystic power with rationality, or to attack feelings of mystic power by setting up expectations and then destroying those; I don't have any idea which. But it somewhat annoys me. It's not a huge deal, but it's annoying. I'm all for repurposing words, but only if there's a decent justification to do so and I don't see one here.
The first of those two hypotheses but yes, it's annoying and jarring. I had kind of hoped Eliezer got the mystic zen martial arts nonsense out of his system years ago and could start talking plain sense now.
I like the mystic Zen martial arts nonsense. Looks like it's the time for a poll. Eliezer's mystic Zen martial arts nonsense is... [pollid:182]
I voted "Don't care", whereas in reality it's more that I like the things like the cult koans and Tsuyoku Naritai, but find the current use of "Koan" so-so (I like the questions, the term "koan" is a bit jarring, but I can get used to it)
I find it super obnoxious, in exactly the same way I felt when my martial arts teachers talked about using my dantian to focus my chi instead of breathing with my diaphragm or whatever is actually useful.
In general the "mystic Zen martial arts nonsense" is a nice antidote to the Straw Vulcan stereotype. That's no excuse for misusing a word in this specific instance, though.
The problem with regular theory exposition is that we don't have a good theoretical framework for discussing how to put theory to practice, so the difficult to express parts about applying the theory just get omitted. I like the martial arts nonsense so far as it connotes an intention that you are supposed to actually put the subject matter to use and win with it, in addition to just appreciating the theory. Since we don't know how to express general instructions for putting theory to practice very well in plain speech, some evocative mysticism may be the best we can do.
I don't always dislike it. "I must become stronger" benefited from the approach. I dislike this specific instance because it's jarring and doesn't fit with the context and it's a misuse of the word "koan".
If you'll allow me to take this a bit out of context, please think of typical Zen usage as "origins of the word" and usage in this sequence of posts as "its current meaning." The difference is obvious, of course - you know what the word means, and anything else is wrong. Which is totally fine. I just wanted to point out that if you try to make your conclusions universal or absolute here, you will in fact create more relativism - the solution is to claim the non-universal knowledge of how words should be used if you're the audience.
I disagree. I would predict that most people have no idea what "koan" means, those that have seriously studied Buddhism are aware of the controversy, and a significant mass of people (especially represented in this demographic) are more familiar with the use of "koan" in programming, as with Ruby koans. The concern seems to be that those who haven't actually studied varieties of Buddhism but are somehow aware of the word "koan" might be confused - but the word is clearly defined before its first use in this sequence:

When I google "koan", the first result is Wikipedia which says a koan is "a story, dialogue, question, or statement, which is used in Zen practice to provoke the "great-doubt", and test the students progress in Zen practice". Very Zen, that supports my side. The second result is Merriam-Webster's dictionary, which says a koan is "a paradox to be meditated upon that is used to train Zen Buddhist monks to abandon ultimate dependence on reason". My side. The third result is for a page titled "101 Zen Koans", which again supports my belief.

Eliezer has a history of associating mysticism with rationality, as well.

My personal concern is that using words wrong is annoying because I don't like people mucking up my conceptual spaces. I can't disassociate koans from mysticism and riddles, which makes it awkward and aesthetically unpleasing for me to approach problems of rationality from a "koan".

That said, it's probably too late to change the format of the problems in this current sequence. But I'd like it to never happen again after this gets done.

I suspect it will continue to happen. Invoking the cultural trappings of a certain kind of mysticism while discussing traditionally "rational" topics is, as you note, a popular practice... and not only of Eliezer's. I recommend treating the word "koan" as used here as a fancy way of saying "exercise".
And then we realize that the use of the word 'koan' was not entirely serious, and get on with the sequence. Also, note the side-effect of that karma penalty - responding to things without organizing the post appropriately. Whee. (note to self: check when I loaded the page before commenting)

You seem to be exaggerating the generality of the causal Markov condition (CMC) when you say it is deeper and more general than the second law of thermodynamics. In a big world, failures of the CMC abound. Let's say the correlation between the psychic cousin's predictions and the top card of the deck is explained by the person performing the test being a stooge, who is giving some non-verbal indication to the purported psychic about the top card. So here we have a causal explanation of the correlation, as the CMC would lead us to expect. But since we are i... (read more)

I have doubts about how meaningful it is to talk of correlating things that are outside each other's light cones. Besides that, suppose there really are an astronomical number of Boltzmann Brains that you could say are non-causally correlated with the top card of a particular deck of cards. Calling this a failure of the Causal Markov Condition is begging the question because the only thing identifying this set is selection based on the correlation itself. The set you should consider, of all Boltzmann Brains that you could test for correspondence with the top card, will not be correlated with it at all. Follows from it causally, like? :)
I don't see why you would have these doubts. Whether or not two variables are correlated is a purely mathematical condition. Why do you think it matters where in space-time the physical properties those variables describe are instantiated? Wait, why is the relevant reference class the class of all and only Boltzmann brains? It seems more natural to pick a reference class that includes all brains (or brain-states). But in that case, the probabilities of the Boltzmann brain being in the states that it is in will be exactly the same as the probabilities of the psychic cousin being in the states that he is in (since the states are the same by hypothesis), so if the psychic's brain states are correlated with the top card the BB's will be as well. Sure, if you want. I'm not denying here that causality is prior to the second law. I'm denying that the causal Markov condition is prior to the second law.
OK. wrt the light cones, I was posting without my brain switched on. Obviously two events can be outside each others light cones and yet a correlation between them still be observed where their light cones overlap in the future. I was thinking fairly unclearly about whether you could be in an epistemic state to consider correlation between things outside your own light cone, but this is kind of irrelevant, so please disregard. Just because the states are the same doesn't mean the probability of being in that state are the same. It's only meaningful to discuss the probability of an outcome in terms of a probability distribution over possible outcomes. If you pick a set of conditions such as "Boltzmann brains in the same state as that of the psychic cousin" you are creating the hypothetical correlation yourself by the way you define it. To my mind, that's not a thought experiment that can tell you anything.
In my example, I specified that the BB is in a reference class with all other brains, including the psychic cousin's. Given that they are both in the reference class, the fact that the BB and the cousin share the same cognitive history implies that the probabilities of their cognitive histories relative to this reference class are the same. The reference class is what fixes the probability distribution over possible outcomes if you're determining probabilities by relative frequencies, and if they are in the same reference class, they will have the same probability distribution. I suspect Eliezer was thinking of a different probability distribution over brain states when he said the psychic's brain state is correlated with the deck of cards. The probabilities he is referring to are something like the relative frequencies of brain states (or brain state types) in a single observer's cognitive history (ETA: Or perhaps more accurately for a Bayesian, the probabilities you get when you conditionalize some reasonable prior on the sequence of instantiated brain states). Even using this distribution, the BB's brain state will be correlated with the top card.
Even if the BB and the psychic are in causally disconnected parts of your model, them having the same probability of being correlated with the card doesn't imply that the Causal Markov Condition is broken. In order to show that, you would need to specify all of the parent nodes to the BB in your model, calculate the probability of it being correlated with the card, and then see whether having knowledge of the psychic would change your probability for the BB. Since all physics currently is local in nature, I can't think of anything that would imply this is the case if the psychic is outside of the past light cone of the BB. Larger boundary conditions on the universe as a whole that may or may not make them correlate have no effect on whether the CMC holds.
I'm having trouble parsing this comment. You seem to be granting that the BB's state is correlated with the top card (I'm assuming this is what you mean by "having the same probability"), that there is no direct causal link between the BB and the psychic, and that there are no common causes, but saying that this still doesn't necessarily violate the CMC. Am I interpreting you right? If I'm not, could you tell me which one of those premises does not hold in my example? If I am interpreting you correctly, then you are wrong. The CMC entails that if X and Y are correlated, X is not a cause of Y and Y is not a cause of X, then there are common causes of X and Y such that the variables are independent conditional on those common causes.
The CMC is not strictly violated in physics as far as we know. If you specify the state of the universe for the entire past light cone of some event, then you uniquely specify the event. The example that you gave of the rock shooting out of the pond indeed does not violate the laws of physics- you simply shoved the causality under the rug by claiming that the edge of the pond fluctuated "spontaneously". This is not true. The edge of the pond fluctuating was completely specified by the past light cone of that event. This is the sense in which the CMC runs deeper than the 2nd law of thermodynamics- because the 2nd "law" is probabilistic, you can find counterexamples to it in an infinite universe. If you actually found a counterexample to the CMC, it would make physics essentially impossible.
I meant "spontaneous" in the ordinary thermodynamic sense of spontaneity (like when we say systems spontaneously equilibriate, or that spontaneous fluctuations occur in thermodynamic systems), so no violation of microphysical law was intended. Spontaneous here just means there is no discernable macroscopic cause of the event. Now it is true that everything that happened in the scenario I described was microscopically determined by physical law, but this is not enough to satisfy the CMC. What we need is some common cause account of the macroscopic correlation that leads to a coherent inward-directed wave, and simply specifying that the process is law-governed does not provide such an account. I guess you could just say that the common cause is the initial conditions of the universe, or something like that. If that kind of move is allowed, then the CMC is trivially satisfied for every correlation. But when people usually appeal to the CMC they intend something stronger than this. They're usually talking about a spatially localized cause, not an entire spatial hypersurface. If you allow entire hypersurfaces as nodes in your graph, you run into trouble. In a deterministic world, any correlation between two properties isn't just screened off by the contents of past hypersurfaces, it's also screened off by the contents of future hypersurfaces. But a future hypersurface can't be a common cause of the correlated properties, so we have a correlation screened off by a node that doesn't d-separate the correlated variables. This doesn't violate the CMC per se, but it does violate the Faithfulness Condition, which says that the only conditional independencies in nature are the ones described by the CMC. If the Faithfulness Condition fails, then the CMC becomes pretty useless as a tool for discerning causation from correlation. The lessons of Eliezer's posts would no longer apply. So to rule out radical failure of the Faithfulness Condition in a deterministic setting, we have to
Indexically, though, you wouldn't expect to be talking to a mind that just happened to issue something it called predictions, which just happened to be correlated with some unobserved cards, would you? I think the CMC doesn't say that a mind can never be right without being causally entangled with the system it's trying to be right about; just that if it is right, it's down to pure chance.
No, the CMC says that if you conditionalize on all of the direct causes of some variable A in some set of variables, then A will be probabilistically independent of all other variables in that set except its effects. This rules out chance correlation. If there were some other variable in the set that just happened to be correlated with A without any causal explanation, then conditionalizing on A's direct causes would not in general eliminate this correlation.
If coincidences were a violation of the CMC, it wouldn't be a truth at all, would it?
Well, one could still say it was true in certain environments, or true like the Ideal Gas Law is true.

I am really enjoying these causality posts. Thank you for them and for the skillful writing that makes them so readable.

Um, let's see if I get this (thinking to myself but posting here if anyone happens to find this useful - or even intelligible)...

claiming you know about X without X affecting you, you affecting X, or X and your belief having a common cause, violates the Markov condition on causal graphs

The causal Markov condition is that a phenomenon is independent of its noneffects, given its direct causes. It is equivalent to the ordinary Markov condition for Bayesian nets (any node in a network is conditionally independent of its nondescendents, given its parents) w... (read more)

More generally, for me to expect your beliefs to correlate with reality, I have to either think that reality is the cause of your beliefs, expect your beliefs to alter reality, or believe that some third factor is influencing both of them.

I can construct examples where for this to be true requires us to treat mathematical truths as causes. Of course, this causes problems for the Bayesian definition of "cause".

7Eliezer Yudkowsky
Yes. An argument similar to this should still be in the other-edited version of my unfinished TDT paper, involving a calculator on Venus and a calculator on Mars, the point being that if you're not logically omniscient then you need to factor out logical uncertainty for the Markov property to hold over your causal graphs, because physically speaking, all common causes should've been screened off by observing the calculators' initial physical states on Earth. Of course, it doesn't follow that we have to factor out logical uncertainty as a causal node that works like every other causal node, but we've got to factor it out somehow.
My point is more general than this. Namely, that a calculator on Earth and a calculator made by aliens in the Andromeda galaxy would correspond despite humans and the Andromedeans never having had any contact.
Is there some reason not to treat logical stuff as normal causal nodes? Does that cause us actual trouble, or is it just a bit confusing sometimes?
5Eliezer Yudkowsky
In causal models, we can have A -> B, E -> A, E -> ~B. Logical uncertainty does not seem offhand to have the same structure as causal uncertainty.
You seem to be confusing the causal arrow with the logical arrow. As endoself points out here proofs logically imply their theorems, but a theorem causes its proof.
Can you provide an example? I would claim that for any model in which you have a mathematical truth as a node in a causal graph, you can replace that node by whatever series of physical events caused you to believe that mathematical truth.
I add 387+875 to get 1262, from this I can conclude that anyone else doing the same computation will get the same answer despite never having interacted with them.
You can't conclude that unless you are aware of the contingent fact that they are capable of getting the answer right.
"The same computation" doesn't cover that?
Why would you want a mathematical truth on a causal graph? Are the transation probabilities ever going to be less than 1.0?
The transition probabilities from the mathematical truth on something non-mathematical will certainly be less than 1.0.
And the transition probabilities to a truth will be 1.0. So why write it in? It would be like sprinkiling a circuit diagram with zero ohm resistors.
Because otherwise the statement I quoted in the great-great-grandparent becomes false.
Inasmuch as you have stipulated that "performing the same calculation" means "perforing the same calculation correcly", rahter than something like "launching the same algorithm but possibly crashing", your statement is tautologous. In fact, it isa special case of the general statement that anyone succesfully performing a calculation will get the same result as everyone else. But why woud you want to use a causal diagrtam to represent a tuatlotlogy? The two have different properties. Causal diagrams have <1.0 transition probabilities, which tautologies don't. Tautologies have concpetually intelligible relationships between their parts, which causal diagrams don't.
Observe that your two objections cancel each other out. If someone performs the same calculation, there is a significant (but <1.0) chance that it will be done correctly.
What has that to do with mathemmatica truth? You might as well say that if someone follows the same recipe there e is a significant chance that the same dish will be produced. Inasmuch as you are takling about someting that can haphazardly fail, you are not talking about mathematical truth.
I can predict what someone else will conclude, without any causal relationship, in the conventional sense, between us.
Your prediction is a prediction of what someone else will conclude, given a set of initial conditions (the mathematical problem) and a set of rules to apply to these conditions. The conclusion that you arrive at is a causal descendant of the problem and the rules of mathematics; the conclusion that the other person arrives at is a causal descendant of the same initial problem and the same rules. That's the causal link.
That's my point. Specifically, that one should have nodes in one's causal diagram for mathematical truths, what you called "rules of mathematics".
Surely the node should be "person X was taught basic mathematics", and not mathematics itself?
The point of having the node is to have a common cause of person X's beliefs about mathematics and person Y's beliefs about mathematics that explains why these two beliefs are correlated even if both discovered said mathematics interdependently.
What has that to do with any causal powers of mathematical truth?
If you what your causal graph to have the property I quoted here, you need to add nodes for mathematical truths.
Two people can arrive at the same solution to a crossword, but that does not mean there is a Cruciverbial Truth that has causal powers.
Yes it does. In this case said truth even has a physical manifestation, i.e., as the crossword-writer's solution as it exists in some combination of his head and his notes which is causal to the form of the crossword the solver sees.
It only has a physical manifestation. Cruciverbial Truth only summarises what could have been arrived at by a massively fine-grained examinination of the crossword-solver's neurology. It doesn't have causal powers of its own. Its redundant in relation to physics.
Mathematical truths do behave like causes. Remember, Bayesian probabilities represent subjective uncertainty. Yes, my uncertainty about the Riemann hypothesis is correlated with my uncertainty about other mathematical facts is the same way that my uncertainty about some physical facts is correlated with my uncertainty about others, so I can represent them both as Bayesian networks (really, one big Bayesian network, as my uncertainty about math is also correlated with my uncertainty about the world).

To answer your discussion about randomizing the control groups and experimental groups- you don't use randomness or noise to divide those groups. You divide the population for study into the number of groups you need, and make that division such that those groups are as close to identical as possible, using all of the data you have on all of them.

Thermal noise and pseudo-random numbers can be used to break ties, but only because if there were any known distinction between the two outcomes, the classification would be deterministic.

"universe is a connected fabric of causes and effects."

I do not think that the universe as a whole is one fabric of causes and effects. There are isolating layers of randomness and chaos upon which there are new layers of emergence. This is why we can model at all without having one unified model.

"Every causally separated group of events would essentially be its own reality."

Places outside our solar system are their own realities in that sense. We have no effect there. Only maybe someone is there to amplify our radio signals.

Having spent a regrettably large amount of time on forums where the 'magisteria' type questions were had, I think that you're representing the 'outside of science' position slightly unfairly. Obviously, it often tries to have its cake and eat it. But you're substituting 'standard rationality', or perhaps 'questions of cause and effect' for 'science'. Some magisteria-types would say that there are direct causal effects from God or ghosts, but that these do not manifest with the regularity of things that you're likely to be able to find through scientific ex... (read more)

He discusses that distinction here.

Any time there's a noun, a verb, and a subject[sic], there's causality.

Counterexamples "I know this." "Rational people with the same information cannot reasonably disagree about their conclusions." "General and Special Relativity both require that observers in different reference frames measure the length of an artifact differently."

I think you might have meant is "Any time that a concrete subject takes an action with a direct object", there's causality; there's probably a more general form.

I know that the top ca... (read more)