The usefulness of correlations

I sometimes wonder just how useful probability and statistics are. There is the theoretical argument that Bayesian probability is the fundamental method of correct reasoning, and that logical reasoning is just the limit as p=0 or 1 (although that never seems to be applied at the meta-level: what is the probability that Bayes' Theorem is true?), but today I want to consider the practice.

Casinos, lotteries, and quantum mechanics: no problem. The information required for deterministic measurement is simply not available, by adversarial design in the first two cases, and by we know not what in the third. Insurance: by definition, this only works when it's impossible to predict the catastrophes insured against. No-one will offer insurance against a risk that will happen, and no-one will buy it for a risk that won't. Randomised controlled trials are the gold standard of medical testing; but over on OB Robin Hanson points out from time to time that the marginal dollar of medical spending has little effectiveness. And we don't actually know how a lot of treatments work. Quality control: test a random sample from your production run and judge the whole batch from the results. Fine -- it may be too expensive to test every widget, or impossible if the test is destructive. But wherever someone is doing statistical quality control of how accurately you're filling jam jars with the weight of jam it says on the label, someone else will be thinking about how to weigh every single one, and how to make the filling process more accurate. (And someone else will be trying to get the labelling regulations amended to let you sell the occasional 15-ounce pound of jam.)

But when you can make real measurements, that's the way to go. Here is a technical illustration.

Prof. Sagredo has assigned a problem to his two students Simplicio and Salviati: "X is difficult to measure accurately. Predict it in some other way."

Simplicio collects some experimental data consisting of a great many pairs (X,Y) and with high confidence finds a correlation of 0.6 between X and Y. So given the value y of Y, his best prediction for the value of X is 0.6y. [Edit: that formula is mistaken. The regression line for Y against X is Y = bcX/a, assuming the means have been normalised to zero, where a and b are the standard deviations of X and Y respectively. For the Y=X+D1 model below, bc/a is equal to 1.]

Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X. Let us suppose that the standard deviation of Z-X is 10% that of X.

How do these two approaches compare?

A correlation of 0.6 is generally considered pretty high in psychology and social science, especially if it's established with p=0.001 to be above, say, 0.5. So Simplicio is quite pleased with himself.

A measurement whose range of error is 10% of the range of the thing measured is about as bad as it could be and still be called a measurement. (One might argue that any sort of entanglement whatever is a measurement, but one would be wrong.) It's a rubber tape measure. By that standard, Salviati is doing rather badly.

In effect, Simplicio is trying to predict someone's weight from their height, while Salviati is putting them on a (rather poor) weighing machine (and both, presumably, are putting their subjects on a very expensive and accurate weighing machine to obtain their true weights).

So we are comparing a good correlation with a bad measurement. How do they stack up? Let us suppose that the underlying reality is that Y = X + D1 and Z = X + D2, where X, D1, and D2 are normally distributed and uncorrelated (and causally unrelated, which is a stronger condition). I'm choosing the normal distribution because it's easy to calculate exact numbers, but I don't believe the conclusions would be substantially different for other distributions.

For convenience, assume the variables are normalised to all have mean zero, and let X, D1, and D2 have standard deviations 1, d1, and d2 respectively.

Z-X is D2, so d2 = 0.1. The correlation between Z and X is c(X,Z) = cov(X,Z)/(sd(X)sd(Z)) = 1/sqrt(1+d2) = 0.995.

The correlation between X and Y is c(X,Y) = 1/sqrt(1+d2) = 0.6, so d1 = 1.333.

We immediately see something suspicious here. Even a terrible measurement yields a sky-high correlation. Or put the other way round, if you're bothering to measure correlations, your data are rubbish. Even this "good" correlation gives a signal to noise ratio of less than 1. But let us proceed to calculate the mutual informations. How much do Y and Z tell you about X, separately or together?

For the bivariate normal distribution, the mutual information between variables A and B with correlation c is lg(I), where lg is the binary logarithm and I = sd(A)/sd(A|B). (The denominator here -- the standard deviation of A conditional on the value of B -- happens to be independent of the particular value of B for this distribution.) This works out to 1/sqrt(1-c2). So the mutual information is -lg(sqrt(1-c2)).

     corr.     mut. inf.
Simplicio   0.6   0.3219
Salviati   0.995   3.3291

What can you do with one third of a bit? If Simplicio tries to predict just the sign of X from the sign of Y, he will be right only 70% of the time (i.e. cos-1(-c(X,Y))/π). Salviati will be right 96.8% of the time. Salviati's estimate will even be in the right decile 89% of the time, while on that task Simplicio can hardly do better than chance. So even a good correlation is useless as a measurement.

Simplicio and Salviati show their results to Prof. Sagredo. Simplicio can't figure out how Salviati did so much better without taking measurements on thousands of samples. Salviati seemed to just think about the problem and come up with a contraption out of nowhere that did the job, without doing a single statistical test. "But at least," says Simplicio, "you can't throw away my 0.3219, it all adds up!" Sagredo points out that it literally does not add up. The information gained about X from Y and Z together is not 0.3219+3.3291 = 3.6510 bits. The correct result is found from the standard deviation of X conditional on both Y and Z, which is sqrt(1/(1 + 1/d2 + 1/d2)). The information gained is then lg(sqrt(1 + 1/d2 + 1/d2)) = 0.5*lg(101.5625) = 3.3331. The extra information over knowing just Z is only 0.0040 = 1/250 of a bit, because nearly all of Simplicio's information is already included in Salviati's.

Sagredo tells Simplicio to go away and come up with some real data.

56 comments, sorted by
magical algorithm
Highlighting new comments since Today at 6:42 PM
Select new highlight date

I sometimes wonder just how useful probability and statistics are.

Good body, strange intro - what you're doing in the article is using probability theory to compare a certain statistical tool, correlation, with a different sort of evidence that is much more highly correlated, namely what you're calling "terrible measurement". You're using the probability-theoretic tool of conditional probability and mutual information of probability distributions to point this out.

To present this as a refutation of probability is just odd.

(Couldn't get back to this earlier -- busybusybusy before taking a holiday.)

Good body, strange intro - what you're doing in the article is using probability theory to compare a certain statistical tool, correlation, with a different sort of evidence that is much more highly correlated, namely what you're calling "terrible measurement". You're using the probability-theoretic tool of conditional probability and mutual information of probability distributions to point this out.

To present this as a refutation of probability is just odd.

To read it as one is odd. The "strange" intro listed several areas in which probability and statistics are useful (although with slight caveats to the cases of medical research and quality control). The rest is an illustration of its limitations in practice.

I expand more on this in my response to Douglas_Knight and gjm.

Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X. Let us suppose that the standard deviation of Z-X is 10% that of X...

Simplicio can't figure out how Salviati did so much better without taking measurements on thousands of samples.

The article starts and ends with the claim that "probability" is inferior to "real measurement" but I have no idea what the distinction is supposed to be. Salviati's attempt at "real measurement" got him a much better instrument, Z, than when Simplicio "collects some experimental data," but that doesn't mean anything. There's no point of view that's going to make Y look like a better instrument than Z. I suppose someone might be impressed by the p<.001 claim that the cor(X,Y) > .5, but if Salviati knows that cor(X,Z) > .95 with p<.1, he probably knows that cor(X,Z) > .5 with p<.001 anyhow!

It certainly is true that a correlation of .6 doesn't give you a good measurement. (Is that the point?)

I concur. This has nothing to do with the relevance or value of probability and statistics; it's just debunking the idea that a correlation coefficient that's substantial but not very close to +-1 gives you much predictive power.

What makes Simplicio's performance worse than Salviati's isn't the fact that he's using probability and statistics. It's the fact that the information he has available is very poor. Describing what he's got in terms of correlation coefficients has, at most, the effect of obscuring just how terrible they are, but that's not a problem with probability and statistics, it's a problem with not understanding probability and statistics.

Douglas_Knight:

It certainly is true that a correlation of .6 doesn't give you a good measurement. (Is that the point?)

That is part of it.

gjm:

I concur. This has nothing to do with the relevance or value of probability and statistics; it's just debunking the idea that a correlation coefficient that's substantial but not very close to +-1 gives you much predictive power.

That is more of it.

gjm:

What makes Simplicio's performance worse than Salviati's isn't the fact that he's using probability and statistics. It's the fact that the information he has available is very poor.

And this is the final part. As a matter of practical fact -- look at almost any scientific paper that presents correlation coefficients -- if you are calculating correlations, 0.6 is about typical of the correlations you will be finding, and I think I'm being generous there. The reason you don't see correlations of 0.995 reported, let alone 0.99995 (i.e. a measurement to two significant figures) is that if your data were that good, you wouldn't waste your time doing statistics on them. A correlation of 0.6 means that you have poor data and almost no predictive capacity. It takes a correlation of 0.866 to get even 1 bit of mutual information. How often do you see correlations of that size reported?

Statistics is the science of precisely wringing what little information there is from foggy data. And yet, people keep on drawing lines through scatterplots and summarising results as "X's are Y's", even when the implied prediction does only fractionally better than chance.

Eliezer wrote: "Let the winds of evidence blow you about as though you are a leaf, with no direction of your own", which is very inspiring, but in practical terms cannot be taken literally. If you are being blown up and down the probability scale, your probabilities are nowhere near 0 or 1. You can only be easily swayed when you are ignorant. You can only remain easily swayed by remaining ignorant. The moment you acquire knowledge, instead of precisely measured ignorance, you are wearing lead-weighted boots.

That's what I took the point to be. The initial descriptions of what Simplicio and Salviati accomplished make them sound comparable. It wouldn't occur to most that one was overwhelmingly superior to the other. But working it out shows otherwise.

It's true that a lot is buried in the line "Salviati instead tries to measure X, and finds a variable Z which is experimentally found to have a good chance of lying close to X." What was required to establish this "experimental finding"? It might have taken labors far in excess of Simplicio's. But now we know that, unless Salviati had to do much, much more work, his approach is to be preferred.

I think the superiority will be obvious to anyone who's ever seen a few scatterplots of correlated variables, and who can imagine a graph of X against X + noise where sd(noise) = 0.1*sd(X), and who thinks for a moment. Of course many people, much of the time, won't actually think for a moment, but that's a very general problem that can strike anywhere.

Suppose the story had gone like this: Simplicio measures X, and does it so well that his measurement has a correlation of 0.6 with X. Salviati examines lots of pairs (X,Y) and finds that X and Y typically differ by about 0.1 times the s.d. of X. Then the result would have been the same as before. Would that be a reason to say "measurement is no good; use probability and statistics instead"? Of course not.

Suppose the story had gone like this: Simplicio measures X, and does it so well that his measurement has a correlation of 0.6 with X. Salviati examines lots of pairs (X,Y) and finds that X and Y typically differ by about 0.1 times the s.d. of X. Then the result would have been the same as before. Would that be a reason to say "measurement is no good; use probability and statistics instead"? Of course not.

Indeed. What matters is not what the procedures are called, but how they compare. Salviati's results completely trump Simplicio's.

What was required to establish this "experimental finding"?

Correlation, maybe?

Simplicio collects some experimental data consisting of a great many pairs (X,Y) and with high confidence finds a correlation of 0.6 between X and Y. So given the value y of Y, his best prediction for the value of X is 0.6y.

Eh?

That's just not what correlation means. (If we have, say, X=0.6Y or X=100Y or X=0.0001Y, exactly in each case, then the correlation is 1. The correlation tells you nothing about the coefficient in the relationship.)

Presumably X and Y have been converted to canonical form with mean 0, sd 1.

Yes, that was an error. I was thinking of the case where X and Y are both normalised to have s.d. 1, in which case the regression line is indeed Y = cX, but that isn't the case here. In general, the line is Y = bcX/a where the standard deviations of X and Y are a and b.

Can you give some concrete examples of people who you think are making this mistake?

Can you give some concrete examples of people who you think are making this mistake?

He already did:

A correlation of 0.6 is generally considered pretty high in psychology and social science

This is a standard PCT criticism of psychology and social science, i.e., that these low correlation levels are an indicator that they're measuring the wrong things, and indeed using the wrong model of what kinds of things to measure.

(Specifically, the complaint is that those sciences assume organisms are open-loop stimulus-response reactors rather than closed-loop controllers, so they try to measure input-output correlations in their experiments, instead of using their experiments to identify the variables that organisms are able to control or tend to control.)

But social science doesn't respect a correlation of .6 because they think it's a good way to measure something that could be measured directly. They find correlations either as an important step in establishing causation, a way to get large-scale trends, or a good way to measure something that can't be measured directly.

The correlation between smoking and lung cancer is only .7, but that's a very interesting fact. True, just picking out smokers is a terrible way to predict who has lung cancer when compared to even a so-so screening test, which is what I interpreted the point of Richard's article as being. But knowing that there's a high correlation there is useful for other reasons. Since we now know it's causative, we can use it to convince people not to smoke. Even if we didn't know there was causation, it would at least help us to pick out who needs more frequent lung cancer screening tests.. So I am not prepared to immediately accept that someone is doing something wrong if they call a correlation of .6 pretty high.

Can you or Richard give an example of something the people investigating lung cancer could have done with direct measurement that would have been more productive than analyzing the cigarettes-smoking correlation? If not, can you provide a situation where people did overuse correlations when they'd have been better off using a measurement?

But social science doesn't respect a correlation of .6 because they think it's a good way to measure something that could be measured directly. They find correlations either as an important step in establishing causation, a way to get large-scale trends, or a good way to measure something that can't be measured directly.

A correlation of 0.6 is a bad measurement, period. It does not become a good one for want of a better.

Can you or Richard give an example of something the people investigating lung cancer could have done with direct measurement that would have been more productive than analyzing the cigarettes-smoking correlation?

I don't know what you mean by "analysing" a correlation, but this is some of what they did do.

I could have mentioned epidemiology in my intro. The reason it depends on statistics is that it is often much more difficult to discern the actual mechanism of a disease process than to do statistical studies. Googling turns up this study which is claimed (by the scientist doing the work) to be the very first demonstration of a causal link between smoking and lung cancer -- in April of this year (and not the 1st of the month).

But the correlations remain what they are, and it still takes a lot of work to get somewhere with them.

A bad measurement can still be the best there is.

A correlation of 0.6 is a bad measurement, period. It does not become a good one for want of a better.

But it is useful. I think Yvain asked the wrong question. You can do better than correlations, but do you deny that you can draw from them the conclusions that Yvain does? (ie, the population effect of smoking)

The MN scientist is lying. No, I didn't click on the link. Yes, I mean lying, not mistaken.

You can do better than correlations, but do you deny that you can draw from them the conclusions that Yvain does? (ie, the population effect of smoking)

The conclusion he draws is:

Even if we didn't know there was causation, it would at least help us to pick out who needs more frequent lung cancer screening tests.

Sure, standard statistics. No problem, for want of anything better.

On the other hand, if you want to know how the link between smoking and lung cancer works, the epidemiology can do no more than suggest places to look.

The MN scientist is lying. No, I didn't click on the link. Yes, I mean lying, not mistaken.

On closer reading, the actual scientific claim is less than I thought. It's a statistical study correlating the presence of a nitrosamine compound in the urine with lung cancer, and finding a higher correlation than with self-reported smoking. Original paper (full text requires subscription) here and blogged here. So just more statistical epidemiology and not at all epoch-making.

ETA: Extra links, just because these things are worth knowing.

The correlation between smoking and lung cancer is only .7, but that's a very interesting fact.

I wasn't aware that this was considered either psychology or social science; those are the fields towards which the criticism I pointed out was addressed, not medicine. (Medicine has a rather different set of statistics-based, politics-based, and payola-based errors to deal with.)

Correlation's a useful tool when that's all you have; the PCT criticism is that we now have more to go on than that where humans' and other organisms' behavior are concerned, so it's time to become dissatisfied with the old way and get started on improving things.

(Edit to add: WTF? This is the most baffling downvote I've ever seen OR received here, and I've seen some pretty weird ones in the past.)

Really, I'm not hostile to PCT, just skeptical— but given your claims about the predictive power of PCT, and given that it's been studied for 35 years by a large group including several former academics, I think it's fair to ask this: Can you direct me to an experiment such that

  1. PCT makes a clear predictive claim about an observable result
  2. Standard theories of cognition find that claim highly unlikely (either ruling it out or having no reason to pick that behavior out of many other options)
  3. The experiment strongly confirms the PCT prediction
  4. The result has been reproduced by skeptics of PCT, or reproduced in several independent studies by credentialed researchers.

Note the importance of step 2. The results you've so far pointed out to me (can't find them within LW, sorry) concern a person manipulating a dial to keep a dot in the center of the screen while acted on by unknown, varying forces, and a rat varying the pressure on a lever it needs to hold down in response to varying counterforces. Since these are cases in which 'acting like a controller' is a simple strategy that produces near-optimal results, it doesn't surprise other theories of cognition that the agents arrived at this strategy. (I find it quite probable, in fact, that some form of control theory governs much of our motor impulses, since that's a fairly simple and elegant solution to recurring problems of balance, varying strain, etc.) The point where PCT really diverges from mainstream theories of cognition is in the description of cognitive content, not motor response; and that's where PCT's burden of proof lies.

If PCT is as well-developed across levels as you claim (and well-developed enough to make diagnoses and prescriptions for, say, emotional issues), then it should be easy to make and test such a prediction in a cognitive domain. If you can present me with an experiment that clearly meets those four conditions, I'll be very interested in whatever PCT book you recommend. If 30 years haven't produced such results, then that counts as evidence too.

2 . Standard theories of cognition find that claim highly unlikely (either ruling it out or having no reason to pick that behavior out of many other options)

'Standard theories of cognition' is a broad class that includes so many conflicting and open-ended models that I'm not sure I could come up with an experiment/experimental result pair that fulfills this requirement, even without the requirement that the experiment actually have that result.

That's a good point. I'll have to think carefully about what kind of results would constitute a "surprising" result to theories of mind that include basic modeling capacities and preferences in the usual fashion. Any good suggestions for emending requirement 2 would be appreciated.

I'll have to think carefully about what kind of results would constitute a "surprising" result to theories of mind that include basic modeling capacities and preferences in the usual fashion.

And when you do, what you'll discover is that none of them really predict anything we don't already know about human behavior, or provide a reductionistic model of it.

What's different about PCT is that it gives us a framework for making and testing reductionist hypotheses about what is causing an individual's behavior. We can postulate variables they're controlling, do things to disturb the values of those variables, and observe whether the values are indeed being controlled by the person's behavior.

For example, if we want to know whether someone's "Bruce"-like behavior is due to a fear of success or a desire for failure, we could artificially induce success or failure experiences and observe whether they adjust their behavior to compensate.

Now try that with the standard cognitive theories, which will only give us ways to describe what the person actually does, or make probabilistic estimates about what people usually do in that situation, rather than any way to reduce or compress our description of the person's behavior, so that it becomes a more general predictive principle, instead of just a lengthy description of events.

OK, excellent; since you assert that PCT has so much more predictive power, I'm sure you can show me many impressive, quantitative PCT-driven experimental results that aren't in a domain (like motor response or game strategy) where I already expect to see control-system-like behavior.

For example, if you could get a mean squared error of 10% in predicting a response that balances ethical impulses against selfish ones (say, the amount that a person is willing to donate to a charity, given some sort of priming stimuli), then I'd consider that good evidence. That's the sort of result that would get me to pick up a PCT textbook.

Seriously, please point me to these results.

OK, excellent; since you assert that PCT has so much more predictive power, I'm sure you can show me many impressive, quantitative PCT-driven experimental results that aren't in a domain (like motor response or game strategy) where I already expect to see control-system-like behavior.

For example, if you could get a mean squared error of 10% in predicting a response that balances ethical impulses against selfish ones (say, the amount that a person is willing to donate to a charity, given some sort of priming stimuli), then I'd consider that good evidence.

You've just crossed over two different definitions of "predictive" -- not to mention two different definitions of "science". What I described was something that would give you a "hard", strictly falsifiable fact: is the person controlling variable X or not?

That's actual science. But what you've asked for instead is precisely the sort of probabilistic mush that is being critiqued here in the first place. You are saying, "yes, it's all very well that science can be used to determine the actual facts, but I want some probabilities! Give me some uncertainty, dammit!"

And as a result, you seem to be under the mistaken impression that PCT has some sort of evidence deficiency I need to fix, when it's actually psychology that has a modeling deficiency that needs fixing. How about you show me a genuinely reductionistic (as opposed to merely descriptive) model of human psychology that's been proposed since Skinner?

I only mentioned PCT in this thread in the context of Yvain's request for an example of people making the mistake Richard wrote this post about. And you responded to my criticism of psychology (i.e., it's not a "hard" science) by raising criticisms of PCT that are in fact off-topic to the discussion at hand.

Are you claiming that, if PCT is flawed, then everything in psychology is just jim-dandy fine? Because that's a pretty ludicrous position. Check your logic, and address the topic actually at hand: the complete failure of cognitive-level psychology to come up with a halfway decent reduction of human behavior, instead of just cataloging examples of it.

Otherwise, you are in the exact same position as an intelligent-design advocate pretending that gaps in evolutionary biology mean you don't have to consider the gaps in your own theory, or lack thereof.

Because PCT could be ludicrously wrong, and it would still be a huge advance in the current state of psychology to be able to nail down with any precision why or how it was wrong.

Which is why critique of PCT is irrelevant to this topic: you could disprove PCT utterly, and the given criticism of psychology would still stand, just like disproving evolution wouldn't make "God did it" any more plausible or useful of a theory.

So let's say, for the sake of argument, that I utterly recant of PCT and say it's all gibberish. How would that improve the shoddy state of psychology in the slightest? What would you propose to replace PCT as an actual model of human innards?

Let's hear it. Name for us the very best that modern psychology has given us since Skinner, of any attempt to actually define an executable model of human behavior. Has anyone even tried, who wasn't an outsider to the field?

I'll give this one last try.

You've given me the two results I mentioned above, in the area of motor response. They sound like good experiments to me: you can take a model with relatively few free parameters, and find that most subjects' behavior will fit that model extremely well for some particular values of the parameters. That is the kind of experiment I'd take as good evidence that control theory operates in motor response. (Incidentally, if you could give me a link to those experiments, I'd much appreciate it.)

You've been claiming for months that this is just the tip of the iceberg, that PCT is able to isolate variables that subjects are controlling in cognitive contexts like belief. I would be very interested in this claim if I saw some evidence for it; fortunately, your claim that PCT is able to diagnose and treat cognitive conditions implies that it's strong enough to do the same kind of experiments as in the case of motor response. So I began by asking for references to such results, and gave an example of the kind of result that would definitely move me to look into PCT.

Experimental verification seems to me like the obvious thing for PCT advocates to do if they're confident in their theory and frustrated by its lack of academic respect. I would therefore find it highly unlikely, given that your claims are true, that in 35 years there hasn't been a single positive experimental result in a cognitive context, of the same form as the "controlling the position of the dot" or "varying the force on the bar" experiments. That you meet my question with outrage, rather than with citations, is thus Bayesian evidence against the validity of PCT.

Are you claiming that, if PCT is flawed, then everything in psychology is just jim-dandy fine?

Nope. I'm just claiming that if PCT doesn't have the kind of evidence it claims, then I probably shouldn't bother investigating it. The problems with mainstream psychology are manifold, but the discipline seems to be making (slow) progress by scientific criteria: Tversky and Kahneman, for instance, were making novel and unexpected experimental predictions that turned out to be correct. If your discipline does that much better than mainstream psychology, there should be some strong experimental results that show it.

I really can't imagine that's too much to ask, and that's why I've made this challenge. Point me to experimental results that validate PCT in a cognitive context, and I'll pick up the textbook of your choice. Keep grandstanding against the very kind of evidence you presented as evidence of PCT in motor response, and I'll have to conclude that you're peddling woo-woo. The ball is in your court.

You've been claiming for months that this is just the tip of the iceberg, that PCT is able to isolate variables that subjects are controlling in cognitive contexts like belief. I would be very interested in this claim if I saw some evidence for it.

I take it you skipped reading Marken's references then, since I believe one of the cited papers was on how physicians' errors in prescribing medicine match a PCT model of the situation, but fail to match an intuitive model of how such errors would respond to environmental changes.

You've apparently also been ignoring my repeated mention of time-averaged perceptual variables like "the amount of work I've done today" or "how recently I got laid" - you can have a "feel" for such values, and how they change over time, as well as respond to changes in them. Do you claim to not perceive -- and control -- such variables? Or are you going to say that since "work" and "getting laid" involve physical activity, they are somehow therefore "motor" rather than "cognitive"?

Finally, you seem to have put me in the strange position of a passing physics student being harangued by a young earth creationist, insisting that I prove the age of the universe to your satisfaction, before you will study any physics, whereas I assert that if you were to go and study some physics, it will be obvious to you why YEC-ism is wrong.

But after being harangued at some length, I relent and attempt to begin with some basic equations, which you then argue are not in the Bible and thus not valid evidence. It is at this point, I begin to question who you're trying to convince by your diatribe, and why, if you genuinely want to learn something, why you're spending more time writing than reading. Don't you have a library anywhere near you?

Keep grandstanding against the very kind of evidence you presented as evidence of PCT in motor response, and I'll have to conclude that you're peddling woo-woo.

I'm not sure I follow you here, since I've only referred to neuroanatomy evidence -- i.e. evidence from a "hard" science. You may be confusing me with one of the other PCTers here who've been talking about the joystick perception experiments, which I consider only relevant for debunking Skinner... which isn't really as useful as it used to be.

I take it you skipped reading Marken's references then, since I believe one of the cited papers was on how physicians' errors in prescribing medicine match a PCT model of the situation, but fail to match an intuitive model of how such errors would respond to environmental changes.

OK, that does sound like a result of the type I'm looking for. I think I can find "Error in skilled performance" at my campus library. In the meantime, could you tell me if the following are true in your opinion:

  1. R.S. Marken is a respected researcher in the PCT community, not a fringe figure.
  2. You (P.J. Eby) have read this paper and approve of the methodology.
  3. The results of this paper constitute strong evidence for PCT for an open-minded skeptic who hasn't read the rest of the PCT literature.

Thanks.

I would guess Marken is respected; I have not read the paper, only his brief mention of the results in a talk he gave summarizing his 25 years of PCT-related research. I have no idea whether you would consider it "strong evidence". However, here is a portion of that synopsis:

One surprising result of this modeling effort was the discovery that environmental disturbances, such as look alike/sound alike drug names are expected to have very little effect on prescribing error rate when the error rate is already low. This result is surprising because it contradicts a basic tenet of the field of human factors engineering – a field in which I have also worked. Human factors engineering is based on the premise that the main cause of human error is environmental disturbance in the form of poor system design (such as a poorly designed medication naming system, which gives similar names to very different medications). A control model shows that such environmental disturbances cannot be a major contributor to error when error rates are low because, the fact that error rates are low means that the control process is already effectively compensating for these disturbances.

OK. Well, I've read the paper now, and I find that I strongly disagree with a key component of Marken's methodology, and that I think this zeroes in on the cause of our argument here about what kind of experimental evidence counts for PCT. Frankly, though, I don't want to spend time arguing against it only for you to say "OK, maybe Marken is a crank, but that doesn't say anything against other PCT researchers". So if it's not too much trouble, could I ask you to read the (short) paper and tell me:

  1. Are the methods in Section 4 and 5 standard for PCT research?
  2. Do the results in Section 4 constitute evidence that control theory is a good model for prescription errors?

If the answer to either of these questions is "No", then we're just back where we started, with me asking for experimental evidence for PCT in a cognitive context. If the answer to both is "Yes", then I think I can explain my disagreement.

Thanks for your efforts at an even-handed attempt at seeing if PCT meets vital reality checks.

If Marken will turn out to be both a crank and a respected member of PCT community, it will say something about the community.

ETA: Technical report "Error in Skilled Performance: A Control Model of Prescription Writing" (2002) can be found online here.

So if it's not too much trouble, could I ask you to read the (short) paper and tell me

If it's not too much trouble, would you mind answering even ONE of the many, many points and questions I've brought up in this thread? I mean, as long as we're not trusting each other, I frankly don't trust you not to change your criteria on the fly, either.

For example, you've still not defined what your criteria for what you'd consider a "novel" result, nor which "standard model" you would use as a baseline for comparison. Nor have you addressed the issue of any of the many cognitive variables that are available for your direct observation, nor what your criteria are for what you'd deem "cognitive" vs. "motor".

These are all areas where you are quite free to change your stance at will, and I do not wish to waste any more of my time, if your true goal here is simply to find an excuse (at any cost) to not learn something. I want to make sure that you've stated your true objection first.

Fair enough.

It's hard to define explicitly what I'd consider a novel or surprising result, because— as you point out— mainstream psychology doesn't appear to have a unified reductionistic model of cognition, just an array of identified results and sub-models. I've thus made that requirement more charitable, changing it from "something novel or surprising" to the lower standard of "good modeling by control theory of a cognitive phenomenon", excluding motor response and some games (like a fielder catching a fly ball) in which acting externally like a simple control system is an easy and successful strategy.

By "motor response" I mean just the way that the actual nerves and muscles can vary their particular actions, while not changing the conscious description of what I'm doing. For example, I assign significant probability that a simple control circuit can be found that neatly fits the actions of my leg muscles (or the nerve signals that connect to them) when I'm walking and keeping my balance. I would, however, find it much less probable that a similarly simple control circuit fits my pattern of working vs. procrastinating. (Since control circuits are apparently Turing-complete, of course there's going to be some control circuit that matches it, but in the case of balance I think there's probably one with few enough parameters that it compresses the data effectively, compared to other models; while in the case of akrasia I doubt this.)

So I would count work vs. procrastination, or prescription errors, or charitable donations, or changing beliefs, as just a few examples of cognitive phenomena. Something like variation in libido over time, though, wouldn't surprise me as much if I find a control circuit model for it (though it would surprise me more than the balance example). I think it's fair to ask PCT for experimental evidence in the cognitive domain, since the way you diagnose and prescribe around here seems to presuppose some rather simple control circuits in cognitive phenomena.

As for considering direct introspection rather than experimental evidence, I'm rather mistrustful of what I consciously intuit about my own mind, since conscious awareness seems to be often distorted for signaling purposes, and since the false perception of religious experience (which I really wanted to be genuine) was one thing that kept me religious longer than I should have been. At this point, I strongly prefer experimental evidence.

With that said, could you read Marken's paper and tell me whether you stand behind it in the terms I asked above?

With that said, could you read Marken's paper and tell me whether you stand behind it in the terms I asked above?

Now that I've read it, I have to say I agree with you: it is not good evidence. At best, it's an application of PCT to generate an interesting hypothesis or two.

I would, however, find it much less probable that a similarly simple control circuit fits my pattern of working vs. procrastinating. (Since control circuits are apparently Turing-complete, of course there's going to be some control circuit that matches it, but in the case of balance I think there's probably one with few enough parameters that it compresses the data effectively, compared to other models; while in the case of akrasia I doubt this.)

I'm not sure why you'd expect akrasia to be a simple circuit. If it were a simple conflict, between exactly two things, you'd likely be able to resolve it consciously without much effort. A few weeks ago, I did a workshop where we charted a portion of one person's control structure in the area of not working on the iPhone app they wanted to write. It took a couple hours and filled most of a page with the relevant cognitive-level variables and their interconnections.

This is quite consistent with e.g. Ainslie's model of akrasia as involving multiple competing "interests"; I see PCT as an improvement over Ainslie in providing a straightforward implementation mapping, plus simplified management of Ainslie's notion of "appetites", which is not very well worked out (IMO) and a little too handwavy.

Replacing Ainslie's idea of "interests" having "appetites" with controllers measuring time-averaged variables seems like a straightforward win: instead of two entities, you have just one entity that's structurally similar to things we know our brains/nervous systems already have. (Also, Ainslie has no worked-out model for how prioritization and agreement between interests occur; PCT on the other hand has hierarchy and reference levels to account for them.)

I think it's fair to ask PCT for experimental evidence in the cognitive domain, since the way you diagnose and prescribe around here seems to presuppose some rather simple control circuits in cognitive phenomena.

Individually, the circuits are simple; collectively, the networks are not. I used to think things were simpler than they are, because I focused only on the things (functional beliefs) that were effectively connections between control circuits. I rarely addressed the settings of the circuits themselves, or used them as a springboard to identifying other beliefs or variables.

I'm rather mistrustful of what I consciously intuit about my own mind, since conscious awareness seems to be often distorted for signaling purposes, and since the false perception of religious experience (which I really wanted to be genuine) was one thing that kept me religious longer than I should have been.

There's a difference between having a false label applied to a true experience, and having a false experience. The existence of perceptions such as "how much work I've gotten done lately" or "how much fun I'm having" is certainly some evidence for PCT's notion of time-averaged perceptual variables that can influence decision-making. It's also parsimonious to assume that the brain is unlikely to have evolved specific circuits for these perceptions, rather than simply having a basis for acquiring new perceptions.

In effect, the PCT model of cognitive variables explains how we represent all the things we "just know" or "just feel", including expert intuition in specialized subjects. The PCT prediction would be that if someone is skilled enough in a subject to have a specific intuition about something, we should be able to find a specific neural signal whose intensity corresponds to the degree of that intuition, and which is a time-averaged function of other (possibly gated) input signals.

I don't see how any of this seems extraordinary or controversial in the slightest, on the perception side.

Control, perhaps, might be more controversial... especially given the implication that we don't control our own actions directly, but can only do so through interaction with the control network. But for me, that implication is uncontroversial, because I've been writing about that (independently formed) idea since 2005.

Powers hypothesizes that "awareness" simply is a debugger that can go in and inspect any part of the network, injecting settings or testing hypotheticals. Anything we do by direct conscious intention would therefore consist of "manually" setting control values in the network, which of course would have no long-term effect if a higher-level controller puts the settings right back when you're done. What's more, if your conscious meddling is interfering with something in an "important" (high) position in the network, it's likely to reorganize in such a way that you no longer want to meddle with the network in that particular way!

And that actually sounds like the most straightforward explanation of akrasic behaviors, ever, and is also 100% consistent with everything I've already previously observed about mind hacking.

That is, we really don't control our own behaviors: our networks do. Free will is really just a special case, even if it doesn't seem that way at first glance. PCT just offers a better explanation than my rough models had for why/how that works.

Now that I've read it, I have to say I agree with you: it is not good evidence. At best, it's an application of PCT to generate an interesting hypothesis or two.

Good. The experiment is, however, very good evidence for the hypothesis that R.S. Marken is a crank, and explains the quote from his farewell speech that didn't make sense to me before:

Psychologists see no real problem with the current dogma. They are used to getting messy results that can be dealt with only by statistics. In fact, I have now detected a positive suspicion of quality results amongst psychologists. In my experiments I get relationships between variables that are predictable to within 1 percent accuracy. The response to this level of perfection has been that the results must be trivial! It was even suggested to me that I use procedures that would reduce the quality of the results, the implication being that noisier data would mean more.

The basic problem is that, generically, if your model uses more free parameters than data points, then it is mathematically trivial that you can get an exact fit to your data set, regardless of what the data are: thus you've provided exactly zero Bayesian evidence that your model fits this particular phenomenon.

(This is precisely the case in the paper you pointed me to. Marken asserts that his model successfully predicts the overall and relative error rates with high precision; but if these rates had been replaced with arbitrary numbers before being fed to him, he would have come up with different experimental values of the parameters, and claimed that his model exactly predicted the new error rates! This is known around here as an example of a fake explanation.)

The fact that Marken was repeatedly told this, interpreted it to mean that others were jealous of his precision, and continued to produce experimental "results" of the same sort along with bold claims of their predictive power, makes him a crank.

Anyhow...

The point I keep stressing is that, if cognitive-domain PCT is precise enough to do treatment with, then it can't be bereft of experimental consequences; and no matter how appealing certain aspects of it might be intuitively, a lack of experimental support after 35 years looks pretty damning. If every cognitive circuit is so complicated that you can't make an observable prediction (about an individual in varying circumstances, or different people in the same circumstances, etc) without assuming more parameters than data points... then PCT doesn't actually teach you anything about cognition, any more than the physicists who ascribed fire and respiration to phlogiston actually learned anything from their theory.

You've pointed me to one experiment, which turned out to be the work of a crank; I've accordingly lowered the probability that PCT is valid in the cognitive domain, not because the existence of a crank proves anything against their hypothesis, but because that was the most salient experimental result that you could point to!

I'm still quite able to revise my probability estimate upwards if presented with a legitimate experimental result, but at the moment PCT is down in the "don't waste your time and risk your rationality" bin of fringe theories.

Good. The experiment is, however, very good evidence for the hypothesis that R.S. Marken is a crank, and explains the >quote from his farewell speech that didn't make sense to me before:

I can be a pretty cranky fellow but I think there might be better evidence of that than the model fitting effort you refer to. The "experiment" that you find to be poor evidence for PCT comes from a paper published in the journal Ergonomics that describes a control theory model that can be used as a framework for understanding the causes of error in skilled performance, such as writing prescriptions. The fit of the model to the error data in Table 1 is meant to show that such a control model can produce results that mimic some existing data on error rates (and without using more free parameters than data points; there are 4 free parameters and 4 data points; the fit of the model is, indeed, very good but not perfect).

But the point of the model fitting exercise was simply to show that the control model provides a plausible explanation of why errors in skilled performance might occur at particular (very low) rates. The model fitting exercise was not done to impress people with how well the control model fits the data relative to other models since, to my knowledge, there are no comparable models of error against which to compare the fit .As I said in the introduction to the paper, existing models of error (which are really just verbal descriptions of why error occurs) "tell us the factors that might lead to error, but they do not tell us why these factors produce an error only rarely."

So if it's the degree of fit to the data that you are looking for as evidence of the merits of PCT then this paper is not necessarily a good reference for that. Actually, a good example of the kind of fit to data you can get with PCT can be gleaned from doing one of the on-line control demos at my Mind Readings site, particularly the Tracking Task. When you become skilled at doing this task you will find that the correlation between the PCT model (called "Model" in graphic display at he end of each trial) and your behavior will be close to one. And this is achieved using a model with no free parameters at all; they are the parameters that have worked for many different individuals and they are now simply constants in the model.

OH, and if you are looking for examples of things PCT can do that other models can't do, try the Mind Reading demo, where the computer uses a methodology based on PCT, called the Test for the Controlled Variable, to tell which of three avatars -- all three of which are being moved by your mouse movements -- is the one being moved intentionally.

The fact that Marken was repeatedly told this, interpreted it to mean that others were jealous of his precision, and continued to produce experimental "results" of the same sort along with bold claims of their predictive power, makes him a crank.

I don't recall ever being told (by reviewers or other critics) that the goodness of fit of my (and my mentor Bill Powers') PCT models to data was a result of having more free parameters than data points. And had I ever been told that I would certainly not have thought it was because others were jealous of the precision of our results. And the main reason I have continued to produce experimental results -- available in my books Mind Readings, More Mind Readings and Doing Research on Purpose-- is not to make bold claims about the predictive power of the PCT model but to emphasize the point that PCT is a model of control, the process of consistently producing pre-selected results in a disturbance prone world. The precision of PCT comes only from the fact that it recognizes that behavior is not a caused result of input or a cognitively planed output but a process of control of input. So if I’m a crank, it’s not because I imagine that my model of behavior fits the data better than other models; it’s because I think my concept of what behavior is is better than other concepts of what behavior is.

I believe Richard Kennaway, who is on this blog, can attest to the fact that, while I may not be the sharpest crayon in the box, I’m not really a crank; at least, no more of a crank than the person who is responsible for all this PCT stuff, the late (great) William T. Powers.

I hope all the formatting comes out ok on this; I can't seem to find a way to preview it.

Best regards

Rick Marken

Actually, I left LessWrong about a year ago, as I judged it to have declined to a ghost town since the people most worth reading had mostly left. I've been reading it now and then since, and might be moved to being more active here if it seems worth it. I don't think I have enough original content to post to be a part of its revival myself.

As Rick says, he can be pretty cranky, but is not a crank.

You know you're replying to an 8-year-old thread, right?

I had no idea. I was just pointed to it recently from another list.

The basic problem is that, generically, if your model uses more free parameters than data points, then it is mathematically trivial that you can get an exact fit to your data set, regardless of what the data are: thus you've provided exactly zero Bayesian evidence that your model fits this particular phenomenon.

I'm not sure I follow you. I didn't get the impression that Marken's model had more tunable parameters than there were data points under study, or that it actually was tunable in such a way as to create any desired result.

If every cognitive circuit is so complicated that you can't make an observable prediction (about an individual in varying circumstances, or different people in the same circumstances, etc) without assuming more parameters than data points...

I don't follow how this is the case. If I establish that a person is controlling for, say, "having a social life", and I know that one of the sub-controlled perceptions is "being on Twitter", then I can predict that if I interfere with their twitter usage they'll try to compensate in some way. I can also observe whether a person's behavior matches their expressed priorities -- i.e., akrasia -- and attempt to directly identify the variables they're controlling.

If at this point, you say that this is "obvious" and not supportive of PCT, then I must admit I'm still baffled as to what sort of result we should expect to be supportive of PCT.

For example, let's consider various results that (ISTM) were anticipated to some extent by PCT. Dunning-Kruger says that people who aren't good at something don't know whether they're doing it well. PCT said - many years earlier, AFAICT - that the ability to perceive a quality must inevitably precede the ability to consistently control that quality.

Which directly implies that "people who are good at something must have good perception of that thing", and "people who are poor at perceiving something will have poor performance at it."

That's not quite D-K, of course, but it's pretty good for a couple decades ahead of them. It also pretty directly implies that people who are the best at something are more likely to be aware of their errors than anyone else - a pretty observable phenomenon among high performers in almost any field.

I'm still quite able to revise my probability estimate upwards if presented with a legitimate experimental result, but at the moment PCT is down in the "don't waste your time and risk your rationality" bin of fringe theories.

This baffles me, since AFAICT you previously agreed that it appears valid for "motor" functions, as opposed to "cognitive" ones.

I consider this boundary to be essentially meaningless myself, btw, since I find it almost impossible to think without some kind of "motor" movement taking place, even if it's just my eyes flitting around, but more often, my hands and voice as well, even if it's under my breath.

It's also not evolutionarily sane to assume some sort of hard distinction between "cognitive" and "motor" activity, since the former had to evolve from some form of the latter.

In any event, the nice thing about PCT is that it is the most falsifiable psychological model imaginable, since we will sooner or later get hard results from neurobiology to confirm its truth or falsehood at successively higher levels of abstraction. As has previously been pointed out here, neuroscience has already uncovered four or five of PCT's expected 9-12 hardware-distinctive controller levels. (I don't know how many of these were known about at the time of PCT's formulation, alas.)

I'm not sure I follow you. I didn't get the impression that Marken's model had more tunable parameters than there were data points under study, or that it actually was tunable in such a way as to create any desired result.

In the section "Quantitative Validation", under Table 1, it says (italics mine):

The model was fit to the data in Table 1 by adjusting only the speed parameter, s, for each prescription component control system... The results in Table 1 show that the distribution of error types produced by the model corresponds almost exactly to the empirical distribution of these rates. The values of s that produced these results were 0.000684, 0.000669, 0.000731 and 0.000738 for the Drug, Dosage, Route and Other component writing control systems, respectively.

As you vary each speed component within the model, the fraction of errors by that component varies all the way from 0 to 1, rather independently of each other. Thus for any empirical or made-up distribution of the four error types, Marken would have calculated values for his four parameters that caused the model to match the four data points; so despite his claims, the empirical data offer literally zero evidence in favor of his model. Ditto with his claim that his model predicts the overall error rate.

I'll get to the rest of this later.

A control model shows that such environmental disturbances cannot be a major contributor to error when error rates are low because, the fact that error rates are low means that the control process is already effectively compensating for these disturbances.

Sorry, but that doesn't sound like an interesting result that vindicates PCT. You can even rephrase the general insight without controls terminology!

Like this: "given a system that is demonstrably robust against failure mode X, it's unlikely to fail in mode X".

Positing a "control system" is just unnecessary length and unnecessary delimitation of the general rule. PCT doesn't get you this insight any faster. And while human factors engineers would discourage similarly named, very different drugs, even they would admit it might not be worth fixing if the system has already operated without ever swapping out the drugs.

When I was starting out in trading I worked at a company where most of the traders were "spread traders" in the futures markets. They would trade either cash vs. futures or different futures expirations against each other. So, for instance, if you had future F1 that expired in september, and future F2 on the same underlying product that expired in december, they would define the spread between them F1-F2, and basically, try to buy that spread and sell it (or sell it and then buy it) over and over again. While F1 and F2 were whipping around, F1-F2 would tend to be pretty steady. The bid/ask spread of F1-F2 was determined by the volatility of F1 and F2, but the volatility of F1-F2 was much lower, so the bid/ask of F1-F2 was large compared to its volatility which is a recipe for juicy trading. So, anyway, these people wanted to me trade equities this way, but I had a super hard time with it. I would take two equities (E1 and E2) that were say 90% correlated, do the regression to get the ratio (r), and then start thinking about E1-rE2 as a spread just like I was used to. What I realized is that for 90% correlated instruments the spread volatility is 31% of the volatility of the naked instrument, which is still large compared bid/ask. The individual instruments that the futures traders were spreading were nearly 100% correlated, which is basically the requirement to have a spread market that you can reasonably talk about.

As pjeby points out, I gave the examples of "psychology and social science". Look at reports that summarise statistical results by claims of the form "X's are Y's", sometimes by the scientists themselves, not journalists. If you want something more concrete than those generalities, see the context of this comment.

I'm not sure what you're trying to prove here. I don't think it's fair to compare a correlation coefficient, which gives you a single parameter and can be used without knowledge of the shapes of the underlying distributions, to a confidence interval for X around Z, which gives you 2 parameters, in a situation where the data actually is normally distributed. Furthermore, you are comparing a correlation coefficient of 0.6 to a measurement where Z-X is within 10% of a standard deviation of X! That's outrageously accurate. For instance, the standard deviation of height of men is 3 inches; so when you want to know someone's height X, you are given a measurement Z that is within 5/16" of X. That's almost within the range of measurement error you would get measuring X directly.

Make a comparison where you're given correlations of .67 to 2 independent variables, versus a measurement that gives you 90% confidence of being within 2 standard deviations of the value of X, where Z is represented as being normally distributed around X, but is (unknown to Salviati) highly-skewed around X. I haven't actually worked out the math to see what a fair test would be, but the example written up here is egregiously unfair.

In other words, don't think you're done because you've found a high correlation.

Great write-up.

It would be infuriating to deal with a lazy Bayesian, who refuses to get more data, but only wants to swap priors with you :)

That... was pretty epic. Instant upvote and thanks. I suddenly feel an urgent need to go change my mind on many real-world issues.

In other words, y = mx + b is better than y = mx.

True, but not a refutation of probability.

No, that isn't it. The post still asserts true facts even if b = 0.

"Z lies close to X" means Z = mx + X, whereas saying X and Y are correlated only expresses what you know about Y = mX.

Huh? Everything in the post still follows if you impose the condition that X, Y, and Z are all zero mean. Then b is always zero for both the (X,Y) relationship and the (X,Z) relationship. Also, "Z lies close to X" means Z = X + e, where e is a zero-mean random variable with a standard deviation a fair bit smaller than the standard deviation of X. What the heck is the lower-case "x" in your equation?

My 'mx' is the same as your 'e'.

Okay, it is also true that correlation does not account for scale. Y's correlation with X does not discern between Y = X and Y = 9X.

I still don't understand how it can be a condemnation of probability and statistics. Both the approaches use probability and statistics. The opening sentence implies that what follows compares probability and statistics to some other approach, but there isn't any other approach following it. Just correlation vs. confidence interval. A correlation gives you one parameter. A confidence interval gives you two. Of course the latter will give you more information.

The confidence interval also works well because Z is normally distributed around X. That's very fortunate. Correlations work even when you don't know the distribution. Rework this example, but assume that Z is characterized using mean and stdev but secretly has a heavily-skewed distribution around X, and see how they compare then.

I don't disagree with anything in the above comment. The opening is odd, as others have noted, and I agree. I'm not sure why you're introducing confidence intervals though.

"a variable Z which is experimentally found to have a good chance of lying close to X. Let us suppose that the standard deviation of Z-X is 10% that of X".

So, you have a confidence interval for X around Z.