Protos have some relatively unique characteristics and pathologies that push me to recommend this pattern, where I wouldn't necessarily do so for other "shared dependencies" (though I think I probably still favor a monorepo in most situations).
They're fairly distinct as far as build targets go, since you often end up generating packages in multiple languages, and the relevant versioning strategy is per-build-target, not per-source-file. It is important to avoid the issues with inappropriate shared dependencies across protos that I mentioned in the article because that is one way you run into trouble with versioning, but imo the solution to that is "programmatically enforce proper separation of concerns within the monorepo".
Appreciate the recommendation. Around April 1st I decided that the "work remotely for an alignment org" thing probably wouldn't work out the way I wanted it to, and switched to investigating "on-site" options - I'll write up a full post on that when I've either succeeded or failed on that score.
On a mostly unrelated note, every time I see an EA job posting that pays at best something like 40-50% of what qualified candidates would get in the industry, I feel that collide with the "we are not funding constrained" messaging. I understand that there are reasons why EA orgs may not want to advertise themselves as paying top-of-market, but nobody's outright said that's what's going on, and there could be other less-visible bottlenecks that I haven't observed yet.
My guess is that it's something like "the impact of mitigating x-risks is probably orders of magnitude greater than public health interventions" (which might be what you meant by "unless you're very optimistic about X-risk charities being effective").
Was this specifically with protos? "very weak separation between service models and clients" doesn't sound like something that'd happen with protos, since clients are generated from the service models directly.
Can you go into more detail on the specific failure modes you ran into that seemed to be downstream of everything living in a monorepo? I agree you need to be more careful about maintaining proper separation of concerns, but I'm not seeing how monorepos would be more likely to cause versioning issues across environments. I can imagine that if protos didn't have a build step, you might run into problems similar to e.g. dynamic linking (or resolution at runtime like with protobufjs), and it might be easier to establish a versioning strategy other than "just pull latest main
, it's fine!" with separate repos, but that's typically not how protos work. I guess there can be a similar failure mode if you tell your build system to resolve proto dependencies at build time by compiling from whatever protos are currently in the repo, instead of using a fixed published version for each target? I might not be understanding what you're pointing at, though.
These seem mostly non-responsive to the described situation.
The ladders for data scientists and data engineers tends to be comparable at these kinds of companies. If you have a quantitative PhD with an emphasis on any domain that has transferable domain knowledge (i.e. anything math/CS/econ probably counts) then you might even be able to start one step up the ladder. But even if you just learned R to do data analysis in some other field that would probably make it much easier to pick up, say, python, and hop sideways.
I should note that the junior-level compensation is probably the most difficult to attain if you're coming in from another field, since the major pipeline is "college grad with several internships gets hired start into big tech company". By comparison it's much easier to get any sort of entry-level engineering role (which is still going to pay pretty well), get a couple years of experience, then go on to one of the big tech companies. (It's not impossible to get a junior role at a big tech company straight out of the gate; I know several bootcamp grads who have done it. It'll definitely be made easier by having a PhD.)
Here are some reasons:
Interesting, was this recently posted? Do you mind if I DM you with some questions?
Riffing on the idea that "productionizing a cool research result into a tool/product/feature that a substantial number of users find better than their next best alternative is actually a lot of work": it's a lot less work in larger organizations with existing users numbering in the millions (or billions). But, as noted, larger orgs have their own overhead.
I think this predicts that most of the useful products built around deep learning which come out of larger orgs will have certain characteristics, like "is a feature that integrates/enhances an existing product with lots of users" rather than "is a totally new product that was spun up incubator-style within the organization". It plays to the strengths of those orgs - having both datasets and users, playing better with the existing org structure and processes, more incentive-aligned with the people who "make things happen", etc.
A couple examples of what I'm thinking of:
For structural reasons I'd expect "totally novel, standalone products" to come out of startups rather than larger organizations, but because they're startups they lack many of the "hard things are easy" buttons that some larger orgs have.