Introduction
Context: Oliver Habryka commissioned me to study and summarize the literature on distributed teams, with the goal of improving altruistic organizations. We wanted this to be rigorous as possible; unfortunately the rigor ceiling was low, for reasons discussed below. To fill in the gaps and especially to create a unified model instead of a series of isolated facts, I relied heavily on my own experience on a variety of team types (the favorite of which was an entirely remote company).
This document consists of five parts:
- Summary
- A series of specific questions Oliver asked, with supporting points and citations. My full, disorganized notes will be published as a comment.
My overall model of worker productivity is as follows:

Highlights and embellishments:
- Distribution decreases bandwidth and trust (although you can make up for a surprising amount of this with well timed visits).
- Semi-distributed teams are worse than fully remote or fully co-located teams on basically every metric. The politics are worse because geography becomes a fault line for factions, and information is lost because people incorrectly count on proximity to distribute information.
- You can get co-location benefits for about as many people as you can fit in a hallway: after that you’re paying the costs of co-location while benefits decrease.
- No paper even attempted to examine the increase in worker quality/fit you can get from fully remote teams.
Sources of difficulty:
- Business science research is generally crap.
- Much of the research was quite old, and I expect technology to improve results from distribution every year.
- Numerical rigor trades off against nuance. This was especially detrimental when it comes to forming a model of how co-location affects politics, where much that happens is subtle and unseen. The most largest studies are generally survey data, which can only use crude correlations. The most interesting studies involved researchers reading all of a team’s correspondence over months and conducting in-depth interviews, which can only be done for a handful of teams per paper.
How does distribution affect information flow?
“Co-location” can mean two things: actually working together side by side on the same task, or working in parallel on different tasks near each other. The former has an information bandwidth that technology cannot yet duplicate. The latter can lead to serendipitous information sharing, but also imposes costs in the form of noise pollution and siphoning brain power for social relations.
Distributed teams require information sharing processes to replace the serendipitous information sharing. These processes are less likely to be developed in teams with multiple locations (as opposed to entirely remote). Worst of all is being a lone remote worker on a co-located team; you will miss too much information and it’s feasible only occasionally, despite the fact that measured productivity tends to rise when people work from home.
I think relying on co-location over processes for information sharing is similar to relying on human memory over writing things down: much cheaper until it hits a sharp cliff. Empirically that cliff is about 30 meters, or one hallway. After that, process shines.
List of isolated facts, with attribution:
- “The mutual knowledge problem” (Cramton 2015):
- Assumption knowledge is shared when it is not, including:
- typical minding.
- Not realizing how big a request is (e.g. “why don’t you just walk down the hall to check?”, not realizing the lab with the data is 3 hours away. And the recipient of the request not knowing the asker does not know that, and so assumes the asker does not value their time).
- Counting on informal information distribution mechanisms that don’t distribute evenly
- Silence can be mean many things and is often misinterpreted. E.g. acquiescence, deliberate snub, message never received.
- Lack of easy common language can be an incredible stressor and hamper information flow (Cramton 2015).
- People commonly cite overhearing hallway conversation as a benefit of co-location. My experience is that Slack is superior for producing this because it can be done asynchronously, but there’s reason to believe I’m an outlier.
- Serendipitous discovery and collaboration falls off by the time you reach 30 meters (chapter 5), or once you’re off the same hallway (chapter 6)
- Being near executives, project decision makers, sources of information (e.g. customers), or simply more of your peers gets you more information (Hinds, Retelny, and Cramton 2015)
How does distribution interact with conflict?
Distribution increases conflict and reduces trust in a variety of ways.
- Distribution doesn’t lead to factions in and of itself, but can in the presence of other factors correlated with location
- e.g. if the engineering team is in SF and the finance team in NY, that’s two correlated traits for fault lines to form around. Conversely, having common traits across locations (e.g. work role, being parents of young children)] fights factionalization (Cramton and Hinds 2005).
- Language is an especially likely fault line.
- Levels of trust and positive affect are generally lower among distributed teams (Mortenson and Neeley 2012) and even co-located people who work from home frequently enough (Gajendra and Harrison 2007).
- Conflict is generally higher in distributed teams (O’Leary and Mortenson 2009, Martins, Gilson, and Maynard 2004)
- It’s easier for conflict to result in withdrawal among workers who aren’t co-located, amplifying the costs and making problem solving harder.
- People are more likely to commit the fundamental attribution error against remote teammates (Wilson et al 2008).
- Different social norms or lack of information about colleagues lead to misinterpretation of behavior (Cramton 2016) e.g.,
- you don’t realize your remote co-worker never smiles at anyone and so assume he hates you personally.
- different ideas of the meaning of words like “yes” or “deadline”.
- From analogy to biology I predict conflict is most likely to arise when two teams are relatively evenly matched in terms of power/ resources and when spoils are winner take all.
- Most site:site conflict is ultimately driven by desire for access to growth opportunities (Hinds, Retelny, and Cramton 2015). It’s not clear to me this would go away if everyone is co-located- it’s easier to view a distant colleague as a threat than a close one, but if the number of opportunities is the same, moving people closer doesn’t make them not threats.
- Note that conflict is not always bad- it can mean people are honing their ideas against others’. However the literature on virtual teams is implicitly talking about relationship conflict, which tends to be a pure negative.
When are remote teams preferable?
- You need more people than can fit in a 30m radius circle (chapter 5), or a single hallway. (chapter 6).
- Multiple critical people can’t be co-located, e.g.,
- Wave’s compliance officer wouldn’t leave semi-rural Pennsylvania, and there was no way to get a good team assembled there.
- Lobbying must be based in Washington, manufacturing must be based somewhere cheaper.
- Customers are located in multiple locations, such that you can co-locate with your team members or customers, but not both.
- If you must have some team members not co-located, better to be entirely remote than leave them isolated. If most of the team is co-located, they will not do the things necessary to keep remote individuals in the loop.
- There is a clear shared goal
- The team will be working together for a long time and knows it (Alge, Weithoff, and Klein 2003)
- Tasks are separable and independent.
- You can filter for people who are good at remote work (independent, good at learning from written work).
- The work is easy to evaluate based on outcome or produces highly visible artifacts.
- The work or worker benefits from being done intermittently, or doesn’t lend itself to 8-hours-and-done, e.g.,
- Wave’s anti-fraud officer worked when the suspected fraud was happening.
- Engineer on call shifts.
- You need to be process- or documentation-heavy for other reasons, e.g. legal, or find it relatively cheap to be so (chapter 2).
- You want to reduce variation in how much people contribute (=get shy people to talk more) (Martins, Gilson, and Maynard 2008).
- Your work benefits from long OODA loops.
- You anticipate low turnover (chapter 2).
How to mitigate the costs of distribution
- Site visits and retreats, especially early in the process and at critical decision points. I don’t trust the papers quantitatively, but some report site visits doing as good a job at trust- and rapport-building as co-location, so it’s probably at least that order of magnitude (see Hinds and Cramton 2014 for a long list of studies showing good results from site visits).
- Site visits should include social activities and meals, not just work. Having someone visit and not integrating them socially is worse than no visit at all.
- Site visits are more helpful than retreats because they give the visitor more context about their coworkers (chapter 2). This probably applies more strongly in industrial settings.
- Use voice or video when need for bandwidth is higher (chapter 2).
- Although high-bandwidth virtual communication may make it easier to lie or mislead than either in person or low-bandwidth virtual communication (Håkonsson et al 2016).
- Make people very accessible, e.g.,
- Wave asked that all employees leave skype on autoanswer while working, to recreate walking to someone’s desk and tapping them on the shoulder.
- Put contact information in an accessible wiki or on Slack, instead of making people ask for it.
- Lightweight channels for building rapport, e.g., CEA’s compliments Slack channel, Wave’s kudos section in weekly meeting minutes (personal observation).
- Build over-communication into the process.
- In particular, don’t let silence carry information. Silence can be interpreted a million different ways (Cramton 2001).
- Things that are good all the time but become more critical on remote teams
- Clear goals/objectives
- Clear metrics for your goals/objectives
- Clear roles (Zacarro, Ardison, Orvis 2004)
- Regular 1:1s
- Clear communication around current status
- Long time horizons (chapter 10).
- Shared identity (Hinds and Mortensen 2005) with identifiers (chapter 10), e.g. t-shirts with logos.
- Have a common chat tool (e.g., Slack or Discord) and give workers access to as many channels as you can, to recreate hallway serendipity (personal observation).
- Hire people like me
- long OODA loop
- good at learning from written information
- Good at working working asynchronously
- Don’t require social stimulation from work
- Be fully remote, as opposed to just a few people working remotely or multiple co-location sites.
- If you have multiple sites, lumping together similar people or functions will lead to more factions (Cramton and Hinds 2005). But co-locating people who need to work together takes advantage of the higher bandwidth co-location provides..
- Train workers in active listening (chapter 4) and conflict resolution. Microsoft uses the Crucial Conversations class, and I found the book of the same name incredibly helpful.
Cramton 2016 was an excellent summary paper I refer to a lot in this write up. It’s not easily available on-line, but the author was kind enough to share a PDF with me that I can pass on.
My full notes will be published as a comment on this post.
Notes have been moved to this post to save scrolling.
Suggestion: Attach or link these, rather than putting them inline in a comment. I like that they're available, but I had to scroll down many screens to find the actual comments.
In addition to my general comments when I curated this piece... it turns out that understanding how distributed teams work was pretty important in 2020.
In light of this:
Thanks for writing this! I found it very interesting, and I like the style. I particularly hadn't properly appreciated how semi-distributed was worth than either extreme. It's disappointing to hear, but seemingly obvious in retrospect and good to know.
This is a fantastic review of the literature, and a very valuable post - thank you!
My critical / constructive note is that I think that many of the conclusions here are state with too much certainty or are overstated. My promary reasons to think it should be more hedged are that the literature is so ambiguous, the fundamental underlying effects are unclear, the model(s) proposed in the post do not really account for reasonable uncertainties about what factors matter, and there is almost certainly heterogeneity based on factors that aren't discussed.
Thanks for the kind words.
I'm unclear if you think all conclusions should be hedged like that, or my specific strong conclusions (site visits are good, don't split a team) are insufficiently supported.
Somewhere in the middle. Most conclusions should be hedged more than they are, but some specific conclusions here are based on strong assumptions that I don't think are fully justified, and the strength of evidence and the generality of the conclusions isn't clear.
I think that recommending site visits and not splitting a team are good recommendations in general, but sometimes (rarely) could be unhelpful. Other ideas are contingently useful, but often other factors push the other way. "Make people very accessible" is a reasonable idea that in many contexts would work poorly, especially given Paul Graham's points on makers versus managers. Similarly, the emphasis on having many channels for communication seems to be better than the typical lack of communication, but can be a bad idea for people who need time for deep work, and could lead to furthering issues with information overload.
All of that said, again, this is really helpful research, and points to enough literature that others can dive in and assess these things for themselves.
That makes sense. Neither of those was my intention- I declare at the beginning that the research is crap; repeating it at every point seems excessive. And I assumed people would take the conclusions as "this will address this specific problem" rather than "this is a Pure Good Action that will have no other consequences."
I understand that this isn't how it came across to you, and that's useful data. I am curious how others feel I did on this score.
Curated.
(It seemed important that Habryka not be the one to curate this piece, since he had commissioned it. But I independently quite liked it)
Several things I liked about this post:
Object-level Musings on Peer Review
Note: the following is my personal best guesses about directions LW should go. Habryka disagrees significantly with at least some of the claims here — both on the object and meta levels.
This post was also jumped out significantly as... aspiring to higher epistemic standards than the median curated post. This led me to thinking about it through the lens of peer review (which I have previously mused about)
I ultimately want LessWrong to encourage extremely high quality intellectual labor. I think the best way to go about this is through escalating positive rewards, rather than strong initial filters.
Right now our highest reward is getting into the curated section, which... just isn't actually that high a bar. We only curate posts if we think they are making a good point. But if we set the curated bar at "extremely well written and extremely epistemically rigorous and extremely useful", we would basically never be able to curate anything.
My current guess is that there should be a "higher than curated" level, and that the general expectation should be that posts should only be put in that section after getting reviewed, scrutinized, and most likely rewritten at least once. Still, there is something significant about writing a post that is at least worth considering for that level.
This post is one of a few ones in the past few months that I'd be interested in seeing improved to meet that level. (Another recent example is Kaj's sequence on Multi-Agent-Models).
I do think it'd involve some significant work to meet that bar. Things that I'm currently thinking of (not highly confident that any of this is the right thing, but showcasing what sort of improvements I'm imagining)
Is it worth putting all that work for this particular post? Dunno, probably not. But it seems worth periodically reflecting on how far the bar would be set, when comparing what LessWrong could ultimately be vs. what is necessary to in-practice be.
What about getting money involved? Even relatively small amounts can still confer prestige better than an additional tag or homepage section. It seems like rigorous well-researched posts like this are valuable enough that crowdfunding or someone like OpenPhil or CFAR could sponsor a best-post prize to be awarded monthly. If that goes well you could add incentives for peer-review.
Money might do the opposite. "I did all this work and all I got was... several dollars and cents".
A small amount of money would do the opposite of conferring prestige; it would make the activity less prestigious than it is now.
My impression is that money can only lower prestige if the amount is low relative to an anchor.
For example a $3000 prize would be high prestige if it's interpreted as an award, but low prestige if it's interpreted as a salary.
cf. https://en.wikipedia.org/wiki/Knuth_reward_check
What makes this situation unusual is that being acknowledged by famous computer scientist Donald Knuth to have contributed something useful to one of his works is inherently prestigious; the check is evidence of that reward, not itself the reward. (Note that many of the checks do not even get cashed! A trophy showing that you fixed a bug in Knuth’s code is vastly more valuable than enough money to buy a plain slice of pizza.)
In contrast, Less Wrong is not prestigious. No one will be impressed to hear that you wrote a Less Wrong post. How likely do you think it is that someone who is paid some money for a well-researched LW post will, instead of claiming said money, frame the check and display it proudly?
I think you're viewing intrinsic versus extrinsic reward as dichotomous rather than continuous. Knuth awards are on one end of the spectrum, salaries at large organizations are at the other. Prestige isn't binary, and there is a clear interaction between prestige and standards - raising standards can itself increase prestige, which will itself make the monetary rewards more prestigious.
I don't see where Said's comment implies a dichotomous view of prestige. He simply believes the gap between LessWrong and Donald Knuth is very large.
Sure, but we can close the global prestige gap to some extent, and in the mean time, we can leverage in-group social prestige, as the current format implicitly does.
Can you say more about this? That seems like a very valuable but completely different post, which I imagine would take an order of magnitude more effort than investigation into a single area.
Yeah, there's definitely a version of this that is just a completely different post. I think Habryka had his own opinions here that might be worth sharing.
Some off the cuff thoughts:
I find it very hard, possibly impossible, to do the things you ask in this bullet point and synthesis in the same post. If I was going to do that it would be on a per-paper basis: for each paper list the claims and how well supported they are.
This seems interesting and fun to write to me. It might also be worth going over my favorite studies.
Hard because of limitations on written word / UX, or intellectual difficulties with processing that class of information in the same pass that you process the synthesis type of information?
(Re: UX – I think it'd work best if we had a functioning side-note system. In the meanwhile, something that I think would work is to give each claim a rough classification of "high credence, medium or low", including a link to a footnote that explains some of the detais)
Data points from papers can either contribute directly to predictions (e.g. we measured it and gains from colocation drop off at 30m), or to forming a model that makes predictions (e.g. the diagram). Credence levels for the first kind feel fine, but like a category error for model-born predictions . It's not quite true that the model succeeds or fails as a unit, because some models are useful in some arenas and not in others, but the thing to evaluate is definitely the model, not the individual predictions.
I can see talking about what data would make me change my model and how that would change predictions, which may be isomorphic to what you're suggesting.
The UI would also be a pain.
This is awesome, thanks.
In case it’s of interest to anyone, I recently wrote down some short, explicit models of the costs of remote teams (I did not try to write the benefits). Here’s what I wrote:
I think this is a mixed blessing rather than a cost. It makes staff members less likely to be working in alignment with one another, but more likely to be working in their personal flow in the Csikszentmihalyi sense of the word. I believe these two things trade off against each other in general, and things moving the efficient frontier are very valuable.
A pretty high-quality post on a problem many people have had in 2020. That being said, I wonder if the 2020 COVID pandemic will produce enough research to make this redundant in a year? I doubt it, but we'll see.
I haven't reviewed the specific claims of the literature here, but I did live through a pandemic where a lot of these concerns came up directly, and I think I can comment directly on the experience.
Retreats and Site Visits
Making people very accessible
Video/Audio Tech
Written Communication
Datapoint: Stripe's Fifth Engineering Hub is Remote. HN discussion.
Fascinating.
It seems a certain amount of dynamics is relevant, as indicated by the site visits and retreats. I guess you assume the co-located team is static, i.e. no frequent home working or reshuffling with other teams?
I wonder if it's possible to model the impact of such vibrations and transitions between team formations. For example, the Scaled Agile framework proposes static co-located teams with a higher layer of people continuously transferring information between the teams. The teams retreat into a large event a few times a year. Due to personal circumstances I'd love to know their BS factor.
Teams were typically static for the duration of the studies, although IIRC some were newly formed task-focused teams and would reshuffle after the task was over.
Some studies looked at the effect of WFH in co-located team. I didn't focus on this because it wasn't Oliver's main question, but from some reading and personal experience:
Based on this, I think that:
The most relevant paper I read was Chapter 5 of Distributed Work by Hinds and Kiesler. You can find it in my notes by searching for "Chapter 5: The (Currently) Unique Advantages of Collocated Work"
If you need more input, I recommend:
They're podcasts, not literature. But you can download all the shownotes, which read like a whitepaper, if you buy a one-month licence for $20.
Excellent work! I particularly like including your notes in the comments.
I have one question about OODA (I see long loops mentioned in the post, but without attribution; I don't see them mentioned in the notes explicitly). Could you talk more about the long-loop conclusion, and how remote work benefits from it?
My naive guess is that the bandwidth issues associated with remote work cause feedback to take longer, which means longer OODA loops are a desirable trait in the worker, but my confidence is not particularly high.
RE: OODA loops as a property of work: let's take the creation of this post as an example. There were broadly four parts to writing it:
1. Talking to Oliver to figure out what he wanted
2. Reading papers to learn facts
3. Relating all the facts to each other
4. Writing a document explaining the relation
Part 1 really benefited from co-location, especially at first. It was heavily back and forth, and so benefited from the higher bandwidth. The OODA loop was at most the time it took either of us to make a statement.
Part 2 didn't require feedback from anyone, but also had a fairly short OODA loop because I had to keep at most one paper in my head at a time, and dropping down to one paragraph wasn't that bad.
Part 3 had a very long OODA loop because I had to load all the relevant facts in my head and then relate them. An interruption before producing a new synthesis meant losing all the work I'd done till that point.
I also needed all available RAM to hold as much as possible at once. Even certain background noise would have been detrimental here.
Part 4 had a shorter minimum OODA loop than part 3, but every interruption meant reloading the data into my brain, so longer was still better.
Does that feel like it answered your questions?
That is much better, but it raises a more specific question: here you described the loop as a property of the task; but then you also wrote
Which seems to mean you are the one with the long loop. I can easily imagine different people having different maximum loop-lengths, beyond which they are likely to fail. Am I correct in interpreting this to mean something like trying to ensure that the remote worker can handle the longest-loop task you have to give them?
I think tasks, environments and people have a range of allowable OODA loops, and that it's very damaging if there isn't an overlap of all three.