Literature Review: Distributed Teams

Notes have been moved to this post to save scrolling.

Suggestion: Attach or link these, rather than putting them inline in a comment. I like that they're available, but I had to scroll down many screens to find the actual comments.

[-]Raemon5y200Nomination for 2019 Review

In addition to my general comments when I curated this piece... it turns out that understanding how distributed teams work was pretty important in 2020.

[-]Larks7y100

In light of this:

Build over-communication into the process.

In particular, don’t let silence carry information. Silence can be interpreted a million different ways (Cramton 2001).

Thanks for writing this! I found it very interesting, and I like the style. I particularly hadn't properly appreciated how semi-distributed was worth than either extreme. It's disappointing to hear, but seemingly obvious in retrospect and good to know.

[-]Davidmanheim7y90

This is a fantastic review of the literature, and a very valuable post - thank you!

My critical / constructive note is that I think that many of the conclusions here are state with too much certainty or are overstated. My promary reasons to think it should be more hedged are that the literature is so ambiguous, the fundamental underlying effects are unclear, the model(s) proposed in the post do not really account for reasonable uncertainties about what factors matter, and there is almost certainly heterogeneity based on factors that aren't discussed.

[-]Elizabeth7y50

Thanks for the kind words.

I'm unclear if you think all conclusions should be hedged like that, or my specific strong conclusions (site visits are good, don't split a team) are insufficiently supported.

[-]Davidmanheim7y30

Somewhere in the middle. Most conclusions should be hedged more than they are, but some specific conclusions here are based on strong assumptions that I don't think are fully justified, and the strength of evidence and the generality of the conclusions isn't clear.

I think that recommending site visits and not splitting a team are good recommendations in general, but sometimes (rarely) could be unhelpful. Other ideas are contingently useful, but often other factors push the other way. "Make people very accessible" is a reasonable idea that in many contexts would work poorly, especially given Paul Graham's points on makers versus managers. Similarly, the emphasis on having many channels for communication seems to be better than the typical lack of communication, but can be a bad idea for people who need time for deep work, and could lead to furthering issues with information overload.

All of that said, again, this is really helpful research, and points to enough literature that others can dive in and assess these things for themselves.

[-]Elizabeth7y60

That makes sense. Neither of those was my intention- I declare at the beginning that the research is crap; repeating it at every point seems excessive. And I assumed people would take the conclusions as "this will address this specific problem" rather than "this is a Pure Good Action that will have no other consequences."

I understand that this isn't how it came across to you, and that's useful data. I am curious how others feel I did on this score.

[-]Raemon7y90

Curated.

(It seemed important that Habryka not be the one to curate this piece, since he had commissioned it. But I independently quite liked it)

Several things I liked about this post:

It told me some concrete things about remote teams. In particular:

the notion that you should either go "fully remote" or "not remote"
the notion that the benefits of co-locating drop off after a literal radius which extends 30m.

It gave me some sense of how good the evidence on remote teams are (i.e. not very), while providing a bunch of links to followup if I wanted to get an even better sense.
LessWrong currently doesn't feel like rewards serious scholarship as much as it should, so I'd like to generally reward it when it happens. I also think this post did a good job if combining short, easily readable takeaways with the more extensive background literature.

[-]Raemon7y150

Object-level Musings on Peer Review

Note: the following is my personal best guesses about directions LW should go. Habryka disagrees significantly with at least some of the claims here — both on the object and meta levels.

This post was also jumped out significantly as... aspiring to higher epistemic standards than the median curated post. This led me to thinking about it through the lens of peer review (which I have previously mused about)

I ultimately want LessWrong to encourage extremely high quality intellectual labor. I think the best way to go about this is through escalating positive rewards, rather than strong initial filters.

Right now our highest reward is getting into the curated section, which... just isn't actually that high a bar. We only curate posts if we think they are making a good point. But if we set the curated bar at "extremely well written and extremely epistemically rigorous and extremely useful", we would basically never be able to curate anything.

My current guess is that there should be a "higher than curated" level, and that the general expectation should be that posts should only be put in that section after getting reviewed, scrutinized, and most likely rewritten at least once. Still, there is something significant about writing a post that is at least worth considering for that level.

This post is one of a few ones in the past few months that I'd be interested in seeing improved to meet that level. (Another recent example is Kaj's sequence on Multi-Agent-Models).

I do think it'd involve some significant work to meet that bar. Things that I'm currently thinking of (not highly confident that any of this is the right thing, but showcasing what sort of improvements I'm imagining)

Someone doing some epistemic spot checks on the claims made here
Improving the presentation (right now it's written in a kind of bare-bones notes format)
Dramatically improving the notes, to be more readable
Improving the diagram of elizabeth's model of productivity so it's easier to parse.
Orienting a bit more around the "the state of management research is shitty" issue. I think (low confidence) that a good practice for LessWrong, if we review a field and find that the evidence base is very shaky, it'd be good to reflect on what it would take to make the evidence less shaky. (This is beyond scope for what habryka originally commissioned, but feels fairly important in the context I'm thinking through here)

Is it worth putting all that work for this particular post? Dunno, probably not. But it seems worth periodically reflecting on how far the bar would be set, when comparing what LessWrong could ultimately be vs. what is necessary to in-practice be.

[-]hermanubis7y150

What about getting money involved? Even relatively small amounts can still confer prestige better than an additional tag or homepage section. It seems like rigorous well-researched posts like this are valuable enough that crowdfunding or someone like OpenPhil or CFAR could sponsor a best-post prize to be awarded monthly. If that goes well you could add incentives for peer-review.

[-]Elo7y90

Money might do the opposite. "I did all this work and all I got was... several dollars and cents".

[-]Said Achmiz7y50

A small amount of money would do the opposite of conferring prestige; it would make the activity less prestigious than it is now.

[-][anonymous]7y*100

My impression is that money can only lower prestige if the amount is low relative to an anchor.

For example a $3000 prize would be high prestige if it's interpreted as an award, but low prestige if it's interpreted as a salary.

[-]ioannes7y10

cf. https://en.wikipedia.org/wiki/Knuth_reward_check

[-]Said Achmiz7y70

What makes this situation unusual is that being acknowledged by famous computer scientist Donald Knuth to have contributed something useful to one of his works is inherently prestigious; the check is evidence of that reward, not itself the reward. (Note that many of the checks do not even get cashed! A trophy showing that you fixed a bug in Knuth’s code is vastly more valuable than enough money to buy a plain slice of pizza.)

In contrast, Less Wrong is not prestigious. No one will be impressed to hear that you wrote a Less Wrong post. How likely do you think it is that someone who is paid some money for a well-researched LW post will, instead of claiming said money, frame the check and display it proudly?

[-]Davidmanheim7y00

I think you're viewing intrinsic versus extrinsic reward as dichotomous rather than continuous. Knuth awards are on one end of the spectrum, salaries at large organizations are at the other. Prestige isn't binary, and there is a clear interaction between prestige and standards - raising standards can itself increase prestige, which will itself make the monetary rewards more prestigious.

[-]Elizabeth7y40

I don't see where Said's comment implies a dichotomous view of prestige. He simply believes the gap between LessWrong and Donald Knuth is very large.

[-]Davidmanheim7y00

Sure, but we can close the global prestige gap to some extent, and in the mean time, we can leverage in-group social prestige, as the current format implicitly does.

[-]Elizabeth7y30

Orienting a bit more around the "the state of management research is shitty" issue

Can you say more about this? That seems like a very valuable but completely different post, which I imagine would take an order of magnitude more effort than investigation into a single area.

[-]Raemon7y30

Yeah, there's definitely a version of this that is just a completely different post. I think Habryka had his own opinions here that might be worth sharing.

Some off the cuff thoughts:

Within scope for something "close to the original post", I think it'd be useful to have:

clearer epistemic status tags for the different claims.

Which claims are based on out of date research? How old is the research?
Which are based on shoddy research?
What's your credence for each claim?

More generally, how much stock should a startup founder place in this post? In your opinion, does the state of this research rise to the level of "you should most likely follow this post's advice?" or is it more like "eh, read this post to get a sense of what considerations might be at play but mostly rely on your own thinking?"

Broader scope, maybe it's own entire post (although I think there's room for a "couple paragraphs version" and a "entire longterm research project" version)

Generally, what research do you wish had existed, that would have better informed you here?
Are there are particular experiments or case studies that seemed (relatively) easy to replicate, that just needed to be run again in the modern era with 21st century communication tech?

[-]Elizabeth7y10

clearer epistemic status tags for the different claims....

I find it very hard, possibly impossible, to do the things you ask in this bullet point and synthesis in the same post. If I was going to do that it would be on a per-paper basis: for each paper list the claims and how well supported they are.

Generally, what research do you wish had existed, that would have better informed you here?

This seems interesting and fun to write to me. It might also be worth going over my favorite studies.

[-]Raemon7y30

I find it very hard, possibly impossible, to do the things you ask in this bullet point and synthesis in the same post

Hard because of limitations on written word / UX, or intellectual difficulties with processing that class of information in the same pass that you process the synthesis type of information?

(Re: UX – I think it'd work best if we had a functioning side-note system. In the meanwhile, something that I think would work is to give each claim a rough classification of "high credence, medium or low", including a link to a footnote that explains some of the detais)

[-]Elizabeth7y30

Data points from papers can either contribute directly to predictions (e.g. we measured it and gains from colocation drop off at 30m), or to forming a model that makes predictions (e.g. the diagram). Credence levels for the first kind feel fine, but like a category error for model-born predictions . It's not quite true that the model succeeds or fails as a unit, because some models are useful in some arenas and not in others, but the thing to evaluate is definitely the model, not the individual predictions.

I can see talking about what data would make me change my model and how that would change predictions, which may be isomorphic to what you're suggesting.

The UI would also be a pain.

[-]Ben Pace7y*80

This is awesome, thanks.

In case it’s of interest to anyone, I recently wrote down some short, explicit models of the costs of remote teams (I did not try to write the benefits). Here’s what I wrote:

Substantially increases activation costs of collaboration, leading to highly split focus of staff
Substantially increases costs of creating common knowledge (especially in political situations)
Substantially increases barriers to building trust (in-person interaction is key for interpersonal trust)
Substantially decreases communication bandwidth - both rate and quality of feedback - increasing the cost of subtle, fine-grained and specific positive feedback harder, and making strong negative feedback on bad decisions much easier, leading to risk-aversion.
Substantially increases cost of transmitting potentially embarrassing information, and incentivises covering up of low productivity, as it’s very hard for a manager to see the day-to-day and week-to-week output.

[-]Elizabeth7y20

Substantially increases activation costs of collaboration, leading to highly split focus of staff

I think this is a mixed blessing rather than a cost. It makes staff members less likely to be working in alignment with one another, but more likely to be working in their personal flow in the Csikszentmihalyi sense of the word. I believe these two things trade off against each other in general, and things moving the efficient frontier are very valuable.

[-]DanielFilan5y60Nomination for 2019 Review

A pretty high-quality post on a problem many people have had in 2020. That being said, I wonder if the 2020 COVID pandemic will produce enough research to make this redundant in a year? I doubt it, but we'll see.

[-]Raemon5y40Review for 2019 Review

I haven't reviewed the specific claims of the literature here, but I did live through a pandemic where a lot of these concerns came up directly, and I think I can comment directly on the experience.

Some LessWrong team members disagree with me on how bad remote-work is. I overall thought it was "Sort of fine, it made some things a bit harder, other things easier. It made it harder to fix some deeper team problems, but we also didn't really succeed at fixing those team problems for in previous non-pandemic years."
- Epistemic Status, btw: I live the farthest away from all other LW team members, and it's the biggest hassle for me to relocate back to Berkeley, so I have some motivation to think remote-ness isn't as big a deal.
Initially I found it easier to get deep work done and I felt more productive. Over time I think that slid into "well, I work about as productively as I did before the pandemic."
I think the biggest problems are "if anyone on a team develops any kind of aversion or ugh field, it's way harder to fix the problem. You can't casually chat about it over lunch, carefully feeling out their current mood. You have to send them an ominous slack message asking 'hey, um, can we talk?'".
- Elizabeth mentioned this in the OP: "It’s easier for conflict to result in withdrawal among workers who aren’t co-located, amplifying the costs and making problem solving harder."
Other team members have mentioned that it's harder to keep track of what other people are doing, and notice if a teammate is going off in a wrong direction. (This seems to slot into the "information flow" section. Indeed, when we do work in an office we're within the "within about one hallway" distance.

Retreats and Site Visits

We made sure to do at a retreat, putting a bunch of effort into covid quarantining beforehand. tried occasional "meet outdoors for meetings."
This post highlights that doing "enmeshed site visits" is good in addition to retreats. Which, to be fair, I think people on the team did pitch, and mostly it was fairly costly to do it during a pandemic.

Making people very accessible

We tried out software called Tandem that made it easier to immediately voice-call with a person. We stopped using it primarily because it was hogging CPU. But some of us found it pretty disruptive to be always available.
Later we tried out working in the shared Gather Town space. I think this might have worked better if we weren't also trying to make that Gather Town space a populous hub (Walled Garden). This was distracting (although during that period we did successfully stay more in touch with other orgs and friends, which was the explicit goal)
- It sucked that everything other than Zoom had mediocre audio quality

Video/Audio Tech

We tried a huge variety of microphones, headphones, software, wired internet. We never really found a set of tools that didn't randomly spazz out sometime. (Wired headphones and internet ran into a different set of problems than bluetooth)

Written Communication

We used Notion, a sort of Google Docs clone with all kinds of tools integrated into each other. It worked pretty well (easily searchable, has a sidebar where you can see all the documents in a nested hierarchy). It had some bugs.

[-]Ben Pace7y40

Datapoint: Stripe's Fifth Engineering Hub is Remote. HN discussion.

[-]Matthijs Cox7y40

Fascinating.

It seems a certain amount of dynamics is relevant, as indicated by the site visits and retreats. I guess you assume the co-located team is static, i.e. no frequent home working or reshuffling with other teams?

I wonder if it's possible to model the impact of such vibrations and transitions between team formations. For example, the Scaled Agile framework proposes static co-located teams with a higher layer of people continuously transferring information between the teams. The teams retreat into a large event a few times a year. Due to personal circumstances I'd love to know their BS factor.

[-]Elizabeth7y30

Teams were typically static for the duration of the studies, although IIRC some were newly formed task-focused teams and would reshuffle after the task was over.

Some studies looked at the effect of WFH in co-located team. I didn't focus on this because it wasn't Oliver's main question, but from some reading and personal experience:

If a team is set up for colocation, you will miss things working from home, which will hurt alignment and social aspects like trust. This scales faster than linearly.
Almost everyone reports increased productivity working from home.
But some of that comes from being less interruptible, which hurts other people's productivity.
Both duration of team and the expectation of working together in the future do good things to morale, trust, and cooperation.

Based on this, I think that:

Some WFH is good on the margins.
The more access employees have to quiet private spaces at work, the less the marginal gains from WFH (although still some, for things like midday doctors' appointments or just avoiding the commute). I think most companies exaggerate how much these are available.
"Core Hours" is a good concept for both days and times in office, because it concentrates the time people need to defensively be in the office to avoid missing things.
How Scaled Agile effects morale and trust will be heavily dependent on how people relate to the meta-team. If they view themselves as constantly buffeted between groups of strangers, it will be really bad. If they view the meta-team as their real team, full of people they trust and share a common goal with but don't happen to be working as closely with at this time, it's probably a good compromise.

[-]Elizabeth7y10

The most relevant paper I read was Chapter 5 of Distributed Work by Hinds and Kiesler. You can find it in my notes by searching for "Chapter 5: The (Currently) Unique Advantages of Collocated Work"

[-]rmoehn6y30

If you need more input, I recommend:

They're podcasts, not literature. But you can download all the shownotes, which read like a whitepaper, if you buy a one-month licence for $20.

[-]ryan_b7y20

Excellent work! I particularly like including your notes in the comments.

I have one question about OODA (I see long loops mentioned in the post, but without attribution; I don't see them mentioned in the notes explicitly). Could you talk more about the long-loop conclusion, and how remote work benefits from it?

My naive guess is that the bandwidth issues associated with remote work cause feedback to take longer, which means longer OODA loops are a desirable trait in the worker, but my confidence is not particularly high.

[-]Elizabeth7y60

RE: OODA loops as a property of work: let's take the creation of this post as an example. There were broadly four parts to writing it:

1. Talking to Oliver to figure out what he wanted

2. Reading papers to learn facts

3. Relating all the facts to each other

4. Writing a document explaining the relation

Part 1 really benefited from co-location, especially at first. It was heavily back and forth, and so benefited from the higher bandwidth. The OODA loop was at most the time it took either of us to make a statement.

Part 2 didn't require feedback from anyone, but also had a fairly short OODA loop because I had to keep at most one paper in my head at a time, and dropping down to one paragraph wasn't that bad.

Part 3 had a very long OODA loop because I had to load all the relevant facts in my head and then relate them. An interruption before producing a new synthesis meant losing all the work I'd done till that point.

I also needed all available RAM to hold as much as possible at once. Even certain background noise would have been detrimental here.

Part 4 had a shorter minimum OODA loop than part 3, but every interruption meant reloading the data into my brain, so longer was still better.

Does that feel like it answered your questions?

[-]ryan_b7y40

That is much better, but it raises a more specific question: here you described the loop as a property of the task; but then you also wrote

Hire people like me

long OODA loop

Which seems to mean you are the one with the long loop. I can easily imagine different people having different maximum loop-lengths, beyond which they are likely to fail. Am I correct in interpreting this to mean something like trying to ensure that the remote worker can handle the longest-loop task you have to give them?

[-]Elizabeth7y50

I think tasks, environments and people have a range of allowable OODA loops, and that it's very damaging if there isn't an overlap of all three.

LESSWRONG
LW

LESSWRONG
LW

106

Literature Review: Distributed Teams

106

106

Introduction

How does distribution affect information flow?

How does distribution interact with conflict?

When are remote teams preferable?

How to mitigate the costs of distribution