http://www.johndcook.com/blog/2013/02/04/four-hours-of-concentration/

And since this is the Internet, and facts are involved, our gwern turns up there also.

New to LessWrong?

New Comment
40 comments, sorted by Click to highlight new comments since: Today at 11:25 PM

It is striking how he chooses to link to comments mentioning people who claim to do the same, but not to comments mentioning people who claim to do differently.

Missing summary: even the most productive people can only put in 3-4 hours of intense work per day on a consistent basis.

This is not true.


Guys, the reasonable default is not "yeah this sounds similar to what EY said once," but "I don't believe you."

Are you just stating that that the upper limit is not 4? Interesting - what is it and please estimate how rare the ability is. Is there anything else unusual about these people?

They seem to self-select into prominent academic positions (of course I deal with academic folks, not folks with "straight jobs" -- likely there are such people everywhere). I am not sure how rare the ability is, because I think most people do not work up to their genetic limits -- usually acrasia gets them first. Academics often have an easier acrasia problem because (a) their work is interesting/rewarding and (b) collaborators/deadlines help get across motivation lulls.


People in startups are another example. Motivated startup people, especially single digit employees, often work long hours and intensely and productively.

Good for you! :)

I actually was not talking about myself (self-evaluation is noisy), but people I know.

(self-evaluation is noisy)

Don't tell gwern ...

Although something like "hours worked in a productive manner" should be well quantifiable, for most professions.

Self-experimentation is causal inference with a sample size of 1 and selection bias, e.g. silly and doomed.

Self-experimentation with within-subject design can be internally valid (I make sure mine are well-powered, even, which is more than some psychologists can say), but this does nothing about external validity or selection bias.

Which ironically makes self-experimentation somewhat analogous to quantum suicide experiments: because of selection bias, observers of my self-experiments will rationally learn little even as I learn much more. Someone watching quantum suicides will expect to see lots of survivals, and someone watching self-experiments will expect to see lots of positive results, even if quantum suicide doesn't work and self-experiments measure nothing but null effects.

(Except maybe if the observer had some reason to believe they would have learned about my experiments regardless of the experiment results... possibly because they became interested in my writings for non-experiment reasons, maybe? I wonder.)

When you say "can be internally valid" what do you mean? What about interactions from repeated treatments? I mean, correlation can equal causation, too. But that's a pretty weak standard to meet.


Also, how do you know the selection bias does not create non-causal explanations for observed dependence? For example, in case control studies you select based on the child of the outcome:

T -> Y -> S, with unobserved U1 being a parent of T, and U2 being a parent of Y (U1, U2 possibly dependent creating unobserved confounding).

If we select on S FIRST, and THEN try to randomize T (conditioning and do(.) do not commute), then we create a dependence between T and U2 due to "explaining away." Randomizing on T cuts the arc from U1 to T (good -- we get rid of some unobserved confounding), but does nothing about this new dependence between T and U2 introduced by the selection procedure.

When you say "can be internally valid" what do you mean?

http://en.wikipedia.org/wiki/Internal_validity

What about interactions from repeated treatments?

?

Also, how do you know the selection bias does not create non-causal explanations for observed dependence?

I don't understand your hypothetical. Could you give a concrete example?

I don't understand your hypothetical. Could you give a concrete example?

http://www.maths.bris.ac.uk/~maxvd/didelez_etal_StatSci_final.pdf


With repeated measures design the problem is whether the "washout period" is sufficient.

http://www.maths.bris.ac.uk/~maxvd/didelez_etal_StatSci_final.pdf

20 pages of theorems on do-calculus is not helpful, and the examples they use like retrospective pregnancies or case-control studies do not seem to apply to self-experiments.

Maybe I should be clearer: can you give an example of a real self-experiment, preferably one which was done blind & randomized, which is plausibly affected by your selection bias? Because I still don't understand what you are getting at.

With repeated measures design the problem is whether the "washout period" is sufficient.

For simple two-level experiments, reasonable block lengths plus counterbalancing from randomization deals with that.

Actually gwern, I think you gave me an idea for a paper :).

[shortened a long reply]:

I guess the fundamental question is, say you do your self-experiment, and find an effect. Say it's nootropic pills or something. The question is, what can we conclude from this. I can conclude either nothing or very little indeed about nootropics and myself (since as you correctly point out, all data is on you, not on me).

What can you conclude about nootropics and yourself from this experiment? The worry is washout length, and lack of exchangeability between different "copies" of you spread across time (you mention this when talking about good/bad days when you were doing some sort of nicotine test).

What is the strongest effect you ever found in this way?


If you read about statistical power and experimental design for fun, reading about do-calculus will probably not be an awful idea or a waste of time (not that paper though..)

The worry is washout length, and lack of exchangeability between different "copies" of you spread across time

As I said, I don't think washout length is a concern for blocked 2-level experiments. Suppose I block as single days, and unbeknownst to me, I screwed up the literature search and the substance actually lasts 2 days; some of the possible sequences will be messed up and show no difference while other possible pairs will be fine. (A sequence like 10 will be screwed up and look like 11, but a sequence like 01 will show the true difference.) Since the blocks are randomized, there will be a mixed of confounded and accurate blocks: the apparent effect will be weaker than the true effect. I have lost power, but not introduced bias.

(you mention this when talking about good/bad days when you were doing some sort of nicotine test).

I suppose that's one way to think about u-curve responses. Lack of exchangeability sounds pessimistic to me, though: if there really is a u-curve effect, then you can improve the model and make the datapoints comparable by measuring the person to learn where they are on the u-curve before administering the intervention. (At least, I think that's how it went. I thought I read some psychology papers using some sort of methodology like that at some point...)

If you read about statistical power and experimental design for fun, reading about do-calculus will probably not be an awful idea or a waste of time

Certainly, but I have a hard time with math and so learning do-calculus would be a big time investment, one I am loath to make right this month just to understand someone's opaque objection to self-experiments which they probably could easily make clear.

Since the blocks are randomized, there will be a mixed of confounded and accurate blocks: the apparent effect will be weaker than the true effect. I have lost power, but not introduced bias.

I don't think this is a good way to think about confounding. For one thing, you are implicitly assuming the effect is monotonic. Perhaps this is true with nootropics (how do you know though?) Monotonicity is not true in general, though. Maybe treatments and unwashed out partial treatments interact in weird/random ways. In general, if you are adding up unconfounded and confounded days, your sum is garbage, not a weaker version of the true sum.

I suppose that's one way to think about u-curve responses.

A u-curve response is just one type of non-monotonic response. There could be others. I don't think it's entirely scientific to assume either the function is monotonic or it has a monotonic first derivative.. What if there is no simple way to describe the response?

Actually I am not even talking about the response to the treatment. Suppose you were a werewolf, and the outcome you were measuring was a physical test. Now, every few days out of 28 you would measure off the charts completely independently of whatever physical enhancement treatment you were taking, just because you were half-wolf during those days. So you might conclude there is an effect under the null. Now werewolves do not exist, but are you sure this sort of thing doesn't happen with you? How do you know?

one I am loath to make right this month just to understand someone's opaque objection to self-experiments which they probably could easily make clear.

I think that's a curious attitude for someone who is into self-experimentation (independently of whether the opaque objection can be made clear or not). In some sense, do-calculus is the math behind identifying causal effects from data. I am not sure how you can talk about these things with any confidence without reading up on the math. It's like being a practicing consequentialist without knowing some decision theory. You can't just rely on intuition.

I think at the very least you should write down all the assumptions you are making in order to have your conclusions be internally valid.

What is the strongest effect you ever found in this way?

I haven't compiled my results into a table or anything but IIRC, I think the largest effect size so far was taking vitamin D at bedtime with d~=-0.7. (Roughly inline with psychology meta-analyses: effect sizes drop off sharply past |0.6|.)

I don't think this is a good way to think about confounding. For one thing, you are implicitly assuming the effect is monotonic. Perhaps this is true with nootropics (how do you know though?)

The background research and published experiments don't seem to include unusual adjustments for non-monotonicity (not really sure what that means in this context).

Monotonicity is not true in general, though.

In general? Do you have a meta-analysis over hundreds of different kinds of experiments showing this?

Actually I am not even talking about the response to the treatment. Suppose you were a werewolf, and the outcome you were measuring was a physical test. Now, every few days out of 28 you would measure off the charts completely independently of whatever physical enhancement treatment you were taking, just because you were half-wolf during those days. So you might conclude there is an effect under the null. Now werewolves do not exist, but are you sure this sort of thing doesn't happen with you? How do you know?

Wouldn't this be covered by randomization? If I randomize each day to this treatment, half of the wolf-days will be under treatment days and half under control days. They'll inflate the standard deviation and I'll be much less likely to reject the null.

I think that's a curious attitude for someone who is into self-experimentation (independently of whether the opaque objection can be made clear or not).

From the sound of it, you're largely making the theoretician's objection: "but there are a billion ways your simple design could go wrong! How can you do any experiments if you don't understand in detail every underlying tool or theorem?" Well, yes, it's true that I nor other experimenters can't rule out becoming a werewolf on every 5th Tuesday or in setting up an experiment with completely wrong blocks or washouts, nor can we be sure that induction will continue to work tomorrow and we will not be eaten by grues or bleens, but nevertheless...

(not really sure what that means in this context).

I am just saying that confounding could make your effect weaker (if there is cancellation of paths), or stronger (if there is some sort of interaction with the treatment), or weaker sometimes and stronger other times. You just don't know. Confounding doesn't just increase the variance of your effect estimate, it creates bias in the estimate. That is, if you add up some confounded bits to your estimate, you are adding up garbage.

Wouldn't this be covered by randomization?

No. The werewolf example is a clear case of the copies not being exchangeable. Different versions of you could react to (randomized!) treatment differently, and you won't know how without more assumptions. For instance, if you were a woman, you would have a different hormonal composition due to the monthly cycle, etc. etc. etc.

From the sound of it, you're largely making the theoretician's objection: "but there are a billion ways your simple design could go wrong!"

Look, what I am saying is not very complicated. I am not asking you to become a mathematician. You are looking for causal effects. That's great! It is not my goal to discourage you! Just report your assumptions. All of them. Say you assume monotonicity, exchangeability of copies, etc. If you don't know what assumptions you need to make, maybe read up on them. Reporting assumptions is good science, right? It's standard practice in the stats literature.

No, see. The burden of proof is not on me. If you make an assumption, the burden of proof that it holds (or at the very least the burden of reporting) is on you. Causal mechanisms in general are not monotonic...Just report your assumptions. All of them. Say you assume monotonicity, exchangeability of copies, etc. If you don't know what assumptions you need to make, maybe read up on them.

This is an example of what I mean by you are taking a wildly impractical theoretical approach. Have you ever seen an experiment in which every assumption is reported with a proof? No, because such a paper would not be an experiment but an exercise in pure mathematics or statistics and no one would ever get anything done if they tried to actually apply your suggestions since they would spend all their time reading up on various statistical frameworks and going 'well, I guess I should specify this and that assumption but wait don't I also assume independence of who's the current Justice of the Supreme Court?' etc

But don't just assume some random thing you came up with after reading some slice of the literature that happened to catch your fancy will give you the effect you want.

I hate to break it to you, but that's pretty much how it works. People read a slice of the literature, apply simple common models, which yield reasonable answers, and only start delving into the foundations and examining closely the methods if someone makes a good case that a hidden assumption or a method's limitation is important. This should not dismay you any more than a philosopher of science should be dismayed that scientists spend their days in the lab and he is only consulted to deal with borderline cases like Intelligent Design.

Reporting assumptions is standard practice. For example in causal inference literature the mantra is often "we assume SUTVA (stable unit treatment value assumption), and conditional ignorability." You can't prove them all (in fact many are untestable). Reporting is still a good idea (for sensitivity analysis, replication, arguing about their reasonableness, etc.)

That's reporting some assumptions, and presumably ones who have earned their being specifically singled out.

Exchangeability of copies and monotonicity are pretty important. People always report monotonicity (because you get identification when you could not before). But anyways, I shouldn't be the one to have to tell you this.

Also, it's not some, it's all assumptions needed to get your answer from the data. Even if exchangeability holds for you, it might not hold for someone else who might want to try your design. If you don't write down what you assume, how should they know if your design will carry over?


Anyways, this is just the Scruffy AI mistake all over again. Actually it's worse than that. The scientific attitude is to try to falsify, e.g. look for reasons your model might fail. You are assuming as a default that your model is reasonable, and not even leaving a paper trail.

Dozens of fields are concerned with "identifying causal effects from data", pretty much all the natural sciences and all their myriad subspecializations can be viewed through such a lense. That's the crux, can be viewed as such. Yet, I doubt you'll find all that many medical studies, physical experiments, etc. invoking, understanding or even being aware of do-calculus. That does not void their results, there are ways of interpreting the results that do not rely on grasping - or even be aware of - the math behind the curtain.

A biologist can make valid observations about a meadow without being concerned about wave functions; gwern can do internally valid studies without being concerned about the math of do-calculus. Thankfully, or else nothing would get done. Like, ever.

It's nice to be enthusiastic about what you do, but be careful of an apotheosis of your specific field of study.

Dozens of fields are concerned with "identifying causal effects from data", pretty much all the natural sciences and all their myriad subspecializations can be viewed through such a lense.

Indeed.

That's the crux, can be viewed as such. Yet, I doubt you'll find all that many medical studies, physical experiments, etc. invoking do-calculus. That does not void their results, there are ways of interpreting the results that do not rely on grasping - or even be aware of - the math behind the curtain.

"That's just like, your opinion, man."

See, you don't get to say that. When people talk about causal effects from randomization (a la what Fisher talked about), effects of interventions is what they mean. That is the math behind what they want, just like complex valued matrices is the math behind quantum mechanics, or Peano axioms the math behind doing arithmetic. Not everyone uses the language of do(.) (some use potential outcome language, which is equivalent). But either their language is equivalent to do(.), or they are essentially doing garbage (and I assure you, there is a lot of garbage out there). In fields like epidemiology, what they often have is the data people (who know about HIV, say, or cancer), and methods people (who know how not to get garbage from the data).

The fact of the matter is, there are all sorts of gotchas about doing causal inference that being careless and relying on intuitions makes you vulnerable to. I can give endless examples:

(a) People doing longitudinal causal inference basically failed at time-varying confounders until 1986, when the right method was developed. So they would report garbage causal effects from longitudinal studies, because they thought they just need to adjust for these confounders. No. Wrong. Have to use the equivalent of g-computation.

(b) People try to use coefficients of regressions as mediated causal effects, even when this is not warranted (that is, the coefficient doesn't correspond to anything causal). No. Wrong. This fails if you have discrete mediators. This fails with interaction terms. This fails under certain natural modeling choices. This fails if you have unobserved confounding. In general a mediated effect is a complicated function of the observed data, not a regression coefficient.

(c) People try to test for causal null, even when their model does not permit the null to happen. (null paradox)

(d) Don Rubin (famous Harvard statistician, one of the people who wrote down the EM algorithm, and one of the people behind potential outcomes) once said that you should adjust for all covariates. He was just trying to be a good Bayesian (have to use all the data, right?) No. Wrong. You only adjust for what you need to block all non-causal paths, while not opening any non-causal paths.

(e) An example from something written at lesswrong: a Bayesian network is a causal model. No. Wrong. A Bayesian network is a statistical model (a set of densities) defined by conditional independence. In order to have a causal model you need to talk about how interventions relate to observations (essentially you need to say parents are direct causes formally).

Actually the list is so long, I am trying to put it in a paper format.

This stuff is not simple, and even very smart people can be confused! So if you want to do causal inference, you know, read up on it.. I am surprised this is a controversial point. To quote Miguel Hernan, the g-formula (expressing do(.) in terms of observed data) is not a causal method, it is the causal method.

If you don't want to read Pearl, you can read Robins, or Dawid, or the potential outcomes people who learned from Rubin. The formalism is the same.

Nobody owns gwern.

Chuck Norris tried purchasing gwern once, but gwern sent him an exhaustive cost-benefit analysis that caused him to change his mind.

Just more anecdata, but this jives with me. I keep time logs. I have "maximal mental effort" (MME) and "everything else." For me it's about scope and depth: MME is about how much I can integrate and bring to bear on what I'm doing, like my capacity to coherently integrate citations into my writing. So perhaps it's how large and long you can subconsciously sustain "useful potential inputs" to conscious working memory. Monkey coding I can do, for sure, many hours a day. But coding at my maximum ability is, again, 2-4 hours per day.

[-][anonymous]11y20

.

Muehlhauser does not work more than 4 hours a day. He makes each hour last 7.

[-][anonymous]11y10

.

Enlighten those who are not in the know please?

[-][anonymous]11y30

Oh yeah, so here's how I actually log these 2-4 hours: If a) I'm only doing X (e.g. not also eating or listening to music), and b) I truly expect not to be distracted (I'm in an isolated location and email and phone are off), then I log that time. Otherwise, it goes into the "everything else" bucket. From my log, it looks like I average only 15 hours/wk under these criteria.

Many thinks that they can concentrate for much more hours, every day. Perhaps they don't know what does it really mean, to concentrate.

Thats a completely unfalsifiable claim

I can think of two ways to interpret that, both falsifiable:

  1. people are mistaken about their ability to concentrate for many hours.

    This is easily falsified simply by looking for some objective measure of concentration. EEG frequencies, bug rates of checked-in code, performance on any of a thousand psychological tasks/tests, etc.

    We could refine this with explanations for why they are mistaken about their endurance: because their cognitive abilities deteriorate over time including self-monitoring cognition? Self-deception, possibly related to status-seeking? Either of these could lead to a lack of knowing what it really means to concentrate.

  2. people do not put forth peak effort but average effort, and they can do average effort for many more hours and have accurate beliefs, but they are mistaken when they try to distinguish between peak and average. Perhaps it has been too long since they exerted peak mental effort and they've forgotten how unchallenging their daily activities are (I think of intelligent people who crack a math text and remark how much their head hurts and how tired they are after a few minutes).

    The difference can be gauged by lengthy testing with and without high stakes or other forms of pressure, and then one simply watches performance over time: if someone performs better under high stakes and their performance does not drop off over many hours, then their claim was right.

    (Concrete example: subjects take 3 back-to-back SATs covering ~8 hours, for no stakes; then the next day a gun is put to their head... Does SAT 3 have the same or higher score as SAT 1 under both conditions?)

Nah, there are lots of tests for how focused someone is. Approach them from behind and drop a heavy book on the floor. If they get furious at you because it'll take them hours to get back to juggling eggs, or if they don't notice at all, they were concentrating.