This piece is cross-posted on my blog here.
After writing up my research on limits to working, the sheer spread of possibilities amazed me. I genuinely wasn’t sure if I would be able to tell the difference between a day with four hours of deep work and one with eight hours. Surely we could narrow down the hypothesis space from that!
So, I designed a simple experiment. I would do one hour of deep work each day for two days, then two days of four hours each, and finally two days of eight hours each.
Why I chose this experiment
I optimized for quickly testing large effect sizes to narrow my uncertainty. I didn’t expect this experiment to be rigorous or sensitive to small nuances -- n=2 per condition, and I would love to hear any suggestions for how to blind me to whether I was working one or eight hours that day.
But by doing such an extreme experiment, I would definitely see an effect.
My output would have to be uncorrelated with hours worked for me not to. If I couldn’t easily tell a difference in the output between the conditions, it would indicate diminishing marginal returns massively influenced my output. Otherwise, I could get a rough guess at if and how much my output declined.
My guess was that I would get between 50% to 200% more done per hour on the one-hour day than the eight-hour day. I was less sure about the four-hour days, but I guessed my hourly output would fall in between that of the one-hour days and the eight-hour days. I would be quite surprised if I got more done per hour on the eight-hour day than the one-hour day. (Confession, I forgot to write these down before starting the experiment, so I’m writing them now after collecting data but before looking at the results.)
In order to have somewhat comparable results, I spent all twenty six hours writing and tracked how many words I wrote each hour. I scheduled coworking sessions on Focusmate.com to hold myself to a schedule. Since the Focusmate sessions are fifty minutes long, I operationalized “an hour” as a 50-minute pomodoro. (My Toggl tracked time ended up near 24 hours.)
When analyzing the data, I controlled for minutes worked to get words per minute. I tracked my output in a few categories (words drafted, words outlined, words edited, and words proofed), which I adjusted to reflect roughly similar amounts of effort. E.g., words proofed counted as 1/10 as much effort as words drafted. I also gave my word count “bonuses” for other particularly useful work that wasn't reflected in word count, such as thinking hard on how to frame a tricky topic. I did these adjustments and bonuses before calculating totals to compare the days.
Here’s what I found
*I suspect my first day was more productive than normal (it was my most productive hour out of all 26). Given that, I’ve included a conservative estimate below based on the other 1-hr day (750 words).
UPDATE: I did another experiment where I wrote for 1-3 hours per day for five days in a row. During that time, I averaged 727 words per hour. So, my measured output was quite close to my conservative estimate below.
According to this, increasing the hours worked by 4x only increased the words written by 2.2x, and increasing hours 8x resulted in 3.5x more words. These equal 54% and 43% as many words per hour respectively as when I only worked for one hour per day.
So while increasing hours still led to increased output, I got much less done per hour. I’d be better off writing consistently for a few hours each day to efficiently maximize output.
A few other observations
Besides my main question of how working my hours affected my output, I noticed a few other observations that seemed noteworthy. However, these are post-hoc observations. Take them with an extra grain of salt - they might not replicate.
1. Go with the flow. My output varied significantly by hour, but it didn’t decrease linearly during the day. I had spikes throughout the eight-hour days.
However, the spikes were correlated with my subjective experience. (I drafted >=750 words during 5 out of the 17 time blocks when feeling okay or good, but drafted that amount during 0 out of the 12 blocks when I felt blah.)
It worked best to do shallow work (such as proofing a transcription) or switch to a different piece when I felt stuck. My rule of thumb would be to stop if the writing isn’t flowing after ten minutes, though I could see this being easy to Goodhart in the future.
2. Switch up work. I scheduled my one-hour days for when I had the most calls. It ended up that I did a similar number of writing hours + coaching hours the first four days. Writing for one hour a day still felt easier than four hours even when the total hours spent working were similar. This weakly implies these limits may apply only to writing or maybe general deep work time.
3. Motivation matters, maybe. I was tired of writing, and my RSI was flaring up, by midway through the second eight-hour day. I wouldn’t be surprised if some of the slump in output was related to motivation. So, if someone really wanted to write for twelve hours a day, they might be just fine.
So, my data says I can do more than four hours even of deep work, but I’m probably better off consistently writing for a few hours each day.
And I also think it’s possible that all of that could be wrong. Did I notice a big effect because I expected a big effect? Maybe. Was my sample size too small? Yeah. Were my methods susceptible to bias? Yes.
Do I know more than I did before this experiment? Definitely.
My prediction interval narrowed. Now, I would be quite surprised not to see more output from an eight-hour workday than from a four-hour workday even when I’m doing deep work. I would also be quite surprised if I got more done per hour from the eight-hour than from a four-hour workday.
Both of these seemed unlikely but plausible before running the experiment.
Even if I couldn’t get to a final answer, I meaningfully reduced my uncertainty.
At least for myself. I don’t expect these results to generalize widely. They might serve as a rough starting point, but much better to try it out yourself. You might have different types of work, different habits, and different levels of motivation - which could all lead to different results.
If you want to do a similar experiment, here are some questions to ask yourself:
If you do something similar, I’d love to hear the results!