What is important in hiring/field-building in x-risk and AI alignment communities and orgs? I had a few conversations on this recently, and I'm trying to publicly write up key ideas more regularly.

I had in mind the mantra 'better written quickly than not written at all', so you can expect some failures in enjoyability and clarity. No character represents any individual, but is an amalgam of thoughts I’ve had and that others have raised.


albert cares deeply about x-risk from AI, and wants to grow the field of alignment quickly; he also worries that people in x-risk community errs too much on the side of hiring people similar to themselves.

ben cares deeply about x-risk from AI, and thinks that we should grow the AI safety community slowly and carefully; he feels it’s important to ensure new members of the community understand what’s already been learned, and avoid the eternal september effect.


albert: So, I understand you care about picking individuals and teams that agree with your framing of the problem.

ben: That sounds about right - a team or community must share deep models of their problems to make progress together.

albert: Concretely on the research side, what research seems valuable to you?

ben: If you’re asking what I think is most likely to push the needle forward on alignment, then I’d point to MIRI’s and Paul’s respective research paths, and also some of the safety work being done at DeepMind and FHI.

albert: Right. I think there are also valuable teams being funded by FLI and Open Phil who think about safety while doing more mainstream capabilities research. More generally, I think you don't need to hire people that think very similarly to you in your organisations. Do you disagree?

ben: That's an interesting question. On the non-research side, my first thought is to ask what Y Combinator says about organisations. One thing we learn from YC are that the first 10-20 hires of your organisation will make or break it, especially the co-founders. Picking even a slightly suboptimal co-founder - someone who doesn't perfectly fit your team culture, understand the product, and work well with you - is the easiest way to kill your company. This suggests to me a high prior on selectivity (though I haven't looked in detail into the other research groups you mention).

albert: So you're saying that if the x-risk community is like a small company it's important to have similar views, and if it's like a large company it's less important? Because it seems to me that we're more like a large company. There are certainly over 20 of us.

ben: While 'size of company' is close, it's not quite it. You can have small companies like restaurants or corner stores where this doesn't matter. The key notion is one of inferential distance.

To borrow a line from Peter Thiel: startups are very close to being cults, except that where cults are very wrong about something important, startups are very right about something important.

As founders build detailed models of some new domain, they also build an inferential distance of 10+ steps between themselves and the rest of the world. They start to feel like everyone outside the startup is insane, until the point where the startup makes billions of dollars and then the update propagates throughout the world ("Oh, you can just get people to rent out their own houses as a BnB").

A founder has to make literally thousands of decisions based off of their detailed models of the product/insight, and so you can't have cofounders who don't share at least 90% of the deep models.

albert: But it seems many x-risk orgs could hire people who don't share our basic beliefs about alignment and x-risk. Surely you don’t need an office manager, grant writer, or web designer to share your feelings about the existential fate of humanity?

ben: Actually, I'm not sure I agree with that. It again comes down to how much the org is doing new things versus doing things that are central cases of a pre-existing industry.

At the beginning of Open Phil's existence they wouldn't have been able to (say) employ a typical 'hiring manager' because the hiring process design required deep models of what Open Phil's strategy was and what variables mattered. For example ‘how easily someone can tell you the strength and cause of their beliefs’ was important to Open Phil.

Similarly, I believe the teams at CFAR and MIRI have optimised workshops and research environments respectively, in ways that depend on the specifics of their particular workshops/retreats and research environments. A web designer needs to know the organisation’s goals well enough to model the typical user and how they need to interact with the site. An operations manager needs to know what financial trade-offs to make; how important for the workshop is food versus travel versus ergonomics of the workspace. Having every team member understand the core vision is necessary for a successful organisation.

albert: I still think you're overweighting these variables, but that's an interesting argument. How exactly do you apply this hypothesis to research?

ben: It doesn't apply trivially, but I'll gesture at what I think: Our community has particular models, worldview and general culture that helped to notice AI in the first place, and has produced some pretty outstanding research (e.g. logical induction, functional decision theory); I think that the culture is a crucial thing to sustain, rather than to be cut away from the insights it's produced so far. It’s important, for those working on furthering its insights and success, to deeply understand the worldview.

albert: I agree that having made progress on issues like logical induction is impressive and has a solid chance of being very useful for AGI design. And I have a better understanding of your position - sharing deep models of a problem is important. I just think that some other top thinkers will be able to make a lot of the key inferences themselves - look at Stuart Russell for example - and we can help that along by providing funding and infrastructure.

Maybe we agree on the strategy of providing great thinkers the space to think about and discuss these problems? For example, events where top AI researchers in academia are given the space to share models with researchers closer to our community.

ben: I think I endorse that strategy, or at least the low-fidelity one you describe. I expect we'd have further disagreements when digging down into the details, structure and framing of such events.

But I will say, when I've talked with alignment researchers at MIRI, something they want more than people working on agent foundations, or Paul's agenda, are people who grok a bunch of the models and still have disagreements, and work on ideas from a new perspective. I hope your strategy helps discover people who deeply understand and have a novel approach to the alignment problem.


For proofreads on various versions of this post, my thanks to Roxanne Heston, Beth Barnes, Lawrence Chan, Claire Zabel and Raymond Arnold. For more extensive editing (aka telling me to cut a third of it), my thanks to Laura Vaughan. Naturally, this does not imply endorsement from any of them (most actually had substantial disagreements).

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 8:12 AM

What a great post! Very readable, concrete, and important. Is it fair to summarize it in the following way?

A market/population/portfolio of organizations solving a big problem must have two properties:

1) There must not be too much variance within the organizations.

This makes sure possible solutions are explored deeply enough. This is especially important if we expect the best solutions to seem bad.

2) There must not be too little variance among the organizations.

This makes sure possible solutions are explored widely enough. This is especially important if we expect the impact of solutions to be heavy-tailed.

Speculating a bit, evolution seems to do it this way. In order to move around there are wings, fin, legs and crawling bodies. But it's not like dog babies randomly get born with locomotive capacities selected form those, or mate with species having other capacities.

The final example you give, of top AI researchers trading models with people in the community, seem a great example of this. People build their own deep models, but occasionally bounce them off each other just to inject the right amount of additional variance.

I really like how aesthetically well your 1 & 2 fit together, but I'm not sure no. 2 is accurate. I feel like there should hopefully be significant variance among the orgs, but there's a certain sense in which there must be things constant between them. This is analagous to how all startups should have lots of variation in their products, but must all be competent at hiring, sales, user interviews, etc.

But we chatted about this in person and you said you were planning to write a better comment, so I'll hold off writing more until then.

I now understand the key question as being "what baseline of inferential distance should we expect all orgs to have reached?". Should they all have internalised deep security mindset? Should they have read the Sequences? Or does it suffice that they've read Superintelligence? Or that they have a record of donating to charity? And so forth.

Ben seems to think this baseline is much higher than Albert. Which is why he is happy to support Paul's agenda, because it agrees on most of the non-mainstream moving parts that also go into MIRI's agenda, whereas orgs working on algorithmic bias, say, lack most of those. Now in order to settle the debate we can't really push Ben to explain why all the moving parts of his baseline are correct -- that's essentially the voice of Pat Modesto. He might legitimately be able to offer no better explanation than that it's the combined model built through years of thinking, reading, trying and discussing. But this also makes it difficult to settle the disagreement.

Mostly agreement, a few minor points:

ben is happy to support Paul's agenda, because it agrees on most of the non-mainstream moving parts that also go into MIRI's agenda, whereas orgs working on algorithmic bias, say, lack most of those.

I don't actually know how many inferential steps Paul's and agent foundations agendas agree on, whether it's closer to 10% or to 50% or to 90% (and I would love to know more about this) but they do seem to me qualitatively different to things like algorithmic bias.

...we can't really push Ben to explain why all the moving parts of his baseline are correct -- that's essentially the voice of Pat Modesto. He might legitimately be able to offer no better explanation than that it's the combined model built through years of thinking, reading, trying and discussing.

I would change the wording of the second sentence to

He might legitimately not be able to fully communicate his models, because it's built from years of thinking, reading, trying and discussing. Nonetheless, it's valuable to probe it for its general structure, run consistency checks, and see if can make novel predictions, even if full communication is not reachable.

This seems similar to many experts in all fields (e.g. chess, economics, management). Regarding HPMOR, Eliezer wasn't able to (or at least didn't) fully communicate his models of how to write rationalist fiction, and Pat Modesto would certainly say "I can't see how you get to be so confident in your models, therefore you're not allowed to be so confident in your models", but this isn't a good enough reason for Eliezer not to believe the (subtle) evidence he has observed. And this is borne out by the fact he was able to predictably build something surprising and valuable.

Similarly in this position, it seems perfectly appropriate to ask me things like "What are some examples of the inferential steps you feel confident a research path must understand in order to be correct, and how do you get those?" and also to ask me to make predictions based on this, even if the answer to the first question doesn't persuade you that I'm correct.

On this:

I now understand the key question as being "what baseline of inferential distance should we expect all orgs to have reached?"

Yes - let's make an analogy to a startup accelerator. Suppose that you have to get 20 inferential steps right in a row to be a successful startup, where (say) 10 of those are about necessary how-to-start-a-startup skills (things like hiring, user interviews, understanding product-market-fit) and 10 of those are details about your particular product. YC wants everyone to have the same first 10, (I think that's mainly what they select on, but will try to to teach you the rest) but it's important to have lots of variance in the second set of 10. If most startups fail, it's good to have lots of good startups trying lots of different products.

In alignment research, the disagreement is what are the fundamentals that we know you definitely require to make sure your alignment research has a chance of being useful (aka 'actually part of the field of alignment), and what are the bits that we'll call 'ongoing genuine disagreement in the field'. Here's a public note saying I'll come back later this week to give a guess as to what some of those variables are.

Here's a public note saying I'll come back later this week to give a guess as to what some of those variables are.

I thought about it for a while, and ended up writing a nearby post: "Models I use when making plans to reduce AI x-risk".

This isn't "models I use when thinking about the object level alignment problem" or "models I’d use l if I were doing alignment research". Those are a set of more detailed models of how intelligence works in general, and I do intend to write a post about those sometime.

Albert: I agree that having made progress on issues like logical induction is impressive and has a solid chance of being very useful for AGI design. And I have a better understanding of your position - sharing deep models of a problem is important. I just think that some other top thinkers will be able to make a lot of the key inferences themselves - look at Stuart Russell for example - and we can help that along by providing funding and infrastructure.

I think the problem isn't just that other people might not be able to make the key inferences, but that there won't be common knowledge of the models/assumptions that people have. For example, Stuart Russell has thought a lot about research topics in AI safety, but I'm not actually aware of any write-ups detailing his models of the AI safety landscape and problem. (The best I could find was his "Provably Beneficial AI" Asilomar slides, the 2015 Research Agenda, and his AI FAQ, though all three are intended for a general audience.) It's possible, albeit unlikely, that he has groked MIRI's models, and still thinks that value uncertainty is the most important thing to work on (or call for people to work on) for AI safety. But even if this were the case, I'm not sure how we'd find out.

For example, events where top AI researchers in academia are given the space to share models with researchers closer to our community.

Yup. I think this may help resolve the problem.

Huh, I like your point about common knowledge a lot. Will work on that.

I think the framing of the x-risk community as being a single company is off - eg. CFAR and OpenPhil are fairly different culturally, also having different focus (workshops vs. discerning who to fund) and size, among other factors.

Not that this analogy is perfectly correct either, but I think the x-risk community is much more like a group of startups within an incubator. We have units of people with unique cultures who are working toward a goal we all share (success here = saving the world), along with some subset of culture that ties us as a community (in startups: common knowledge of how pitching works, product-market fit, etc).

But I will say, when I've talked with alignment researchers at MIRI, something they want more than people working on agent foundations, or Paul's agenda, are people who grok a bunch of the models and still have disagreements, and work on ideas from a new perspective. I hope your strategy helps discover people who deeply understand and have a novel approach to the alignment problem.

In the later stages of AGI design, hoping that these people come a long is kind of like hoping that explosives experts come a long from the outside and critique the atom bomb design done by nuclear physicists.

They won't have good models of your AGI design because you have adopted the onion strategy of agi discussion and they also have to deal with disinformation about what even AGI is or can do.

To be successful in that kind of scenario you need to be sufficiently self-sceptical to know where your models are weak and pull in expertise you do not have. Nothing can be "and then a miracle occurs". Everything that is not a gears level understanding of the situation should be a like a niggling uncomfort that you must deal with.

I've not seen any significant effort adopt this mind set.

I've not seen any significant effort adopt this mind set.

Could I ask where you've looked? MIRI seems to be trying pretty explicitly to develop this mind set, while Paul Christiano has had extensive back and forth on assumptions for his general approach on his medium blog.

As far as I can tell MIRI are asking other people to develop it for thinking about how to build AGI. They have started hiring for developers rather than mathematicians, but AFAIK they haven't been hiring for the solid state physicists you might want to avoid things like row hammer being a problem.

How to use AGI, they don't seem to have been hiring for psychologists/economists/political scientists/experts on the process of technological development that might help inform/fill in the gaps in their policy/strategy expertise. Unless they have lots of unknown resources.