This field guide was written by the MIRI team with MIRIx groups in mind, though the advice may be relevant to others working on AI alignment research.
Hello! You may notice that you are reading a document.
This fact comes with certain implications. For instance, why are you reading this? Will you finish it? What decisions will you come to as a result? What will you do next?
Notice that, whatever you end up doing, it’s likely that there are dozens or even hundreds of other people, quite similar to you and in quite similar positions, who will follow reasoning which strongly resembles yours, and make choices which correspondingly match.
Given that, it’s our recommendation that you make your next few decisions by asking the question “What policy, if followed by all agents similar to me, would result in the most good, and what does that policy suggest in my particular case?” It’s less of a question of trying to decide for all agents sufficiently-similar-to-you (which might cause you to make the wrong choice out of guilt or pressure) and more something like “if I were in charge of all agents in my reference class, how would I treat instances of that class with my specific characteristics?”
If that kind of thinking leads you to read further, great. If it leads you to set up a MIRIx chapter, even better. In the meantime, we will proceed as if the only people reading this document are those who justifiably expect to find it reasonably useful.
Imagine that you have been tasked with moving a cube of solid iron that is one meter on a side. Given that such a cube weighs ~16000 pounds, and that an average human can lift ~100 pounds, a naïve estimation tells you that you can solve this problem with ~150 willing friends.
But of course, a meter cube can fit at most something like 10 people around it. It doesn’t matter if you have the theoretical power to move the cube if you can’t bring that power to bear in an effective manner. The problem is constrained by its surface area.
MIRIx chapters are one of the best ways to increase the surface area of people thinking about and working on the technical problem of AI alignment. And just as it would be a bad idea to decree "the 10 people who happen to currently be closest to the metal cube are the only ones allowed to think about how to think about this problem", we don’t want MIRI to become the bottleneck or authority on what kinds of thinking can and should be done in the realm of embedded agency and other relevant fields of research.
The hope is that you and others like you will help actually solve the problem, not just follow directions or read what’s already been written. This document is designed to support people who are interested in doing real groundbreaking research themselves.
We sometimes hear questions of the form “Even a summer internship feels too short to make meaningful progress on real problems. How can anyone expect to meet and do real research in a single afternoon?”
There’s a Zeno-esque sense in which you can’t make research progress in a million years if you can’t also do it in five minutes. It’s easy to fall into a trap of (either implicitly or explicitly) conceptualizing “research” as “first studying and learning what’s already been figured out, and then attempting to push the boundaries and contribute new content.”
The problem with this frame (according to us) is that it leads people to optimize for absorbing information, rather than seeking it instrumentally, as a precursor to understanding. (Be mindful of what you’re optimizing in your research!)
There’s always going to be more pre-existing, learnable content out there. It’s hard to predict, in advance, how much you need to know before you’re qualified to do your own original thinking and seeing, and it’s easy to Dunning-Kruger or impostor-syndrome yourself into endless hesitation or an over-reliance on existing authority.
Instead, we recommend throwing out the whole question of authority. Just follow the threads that feel alive and interesting. Don’t think of research as “study, then contribute.” Focus on your own understanding, and let the questions themselves determine how often you need to go back and read papers or study proofs.
Approaching research with that attitude makes the question “How can meaningful research be done in an afternoon?” dissolve. Meaningful progress seems very difficult if you try to measure yourself by objective external metrics. It is much easier when your own taste drives you forward.
No procedure for doing research will fit for everyone. However, what follows are steps which you can try either on your own or in a group setting (such as MIRIx) in order to practice the kind of curiosity-driven research just described.
1. Write a list of questions.
2. Choose one of the questions to focus on, based on what feels most interesting.
3. Clarify your curiosity. What is desired? What do you think might be possible?
4. Keep clarifying.
This resembles how much of the progress at MIRI happens. It’s very different from the attractor of “just read lots of papers,” and it’s very different from the attractor of “try to figure out top-down what the field as a whole needs.”
An easy mistake is to think of yourself as trying to contribute to the world's collective knowledge, and thereby neglecting to prioritize your own knowledge and understanding. "Just read papers" may sound like it's prioritizing your own knowledge, but it often reflects a mindset that’s tacitly assuming that others know exactly what you need to know. "Optimize for your own understanding" is a mindset with a faster feedback loop.
There’s nothing inherently wrong with reading papers—even if it’s just because they’re in the field and you want a broad overview of the field. But throughout, you should be trying to form a picture of what you personally do and don't know how to do, and what you’d need to know how to do in order to solve the problem. That’s hard, and maybe you’re sure that the first five ideas you write down will be wrong. Still, write them down anyway, and try to get them to work, so you can see what happens and discover what goes wrong.
We don’t want a hundred bright minds all asking the exact same questions, and taking the exact same set of assumptions. We want a field full of explorers, not exploiters. Put another way, the best way to become a researcher is to practice the skill of independent thought right from the beginning, rather than exercising your “sit back and absorb information for its own sake” muscles.
So don't ask "What are the open questions in this field?" Ask: "What are my questions in this field?"
Let’s say you’ve tried some things that resemble the above, you enjoyed them, and you want to move forward on starting your own MIRIx chapter.
Our first recommendation is that you find ONE or TWO other people (not three+), and try doing research together once or a few times. There’s more detail below in the social dynamics section about how exactly that might look, but the idea is that you want to establish a tone and flow with a small number of people first. Negotiating a direction for the group tends to be much harder if you start with a larger number of people.
Another important choice which can be difficult to negotiate with a large number of people is schedule. Finding a time and place which is good for everyone can become intractable, and changing it meeting to meeting to try to make it work for everyone can be de-motivating. Choose a schedule which is good for the founding core of the group. What day of the week is good for you? How often do you want to meet? How long do you want meetings to be? We recommend meetings be monthly, weekly, or every other week. Meeting length can be anywhere from an hour to a whole day, depending on what makes sense for you.
Once you find a partner or two that you genuinely enjoy making progress with, your next step is to plan and advertise for a first large meetup (where “large” means something like “three to six new people” and definitely doesn’t mean “twenty or thirty attendees”).
Try to find a venue that is private and sound-isolated, has flat surfaces and comfortable seating, and has whiteboards on the walls. Universities often have spaces like this, as do public libraries, but someone’s living room is fine if you can minimize the number of intrusions and interruptions. If you can’t find a space with whiteboards, look for easels and easel pads, and in either case be sure to bring your own markers. Also bring along spare paper, pens, and clipboards, and assign someone to make sure that there are snacks and drinks.
(A note about snacks and drinks: people almost always underestimate the importance of the quality and quantity of food, anchoring on something like “I dunno, maybe just spend ten bucks on some chips or something?” Instead, ask yourself: what dollar value would I put on a 15% increase in the group’s ability to think, overall mood, and ultimate satisfaction with the event? That’s how much you should consider spending (/ asking MIRI to spend) on snacks, especially for the first meeting. Don’t buy only junk food. It may give you more energy temporarily, but it will make you worse at thinking later. So, especially for longer meetings, healthy snacks are critical. Longer meetings should also include a meal, perhaps at a nearby restaurant. This also serves as a good break.)
At that first large meeting, you’ll want to start by formally electing a president. This is an important piece of common-knowledge culture—many times, the president won’t do much, but it’s extremely useful to have a single person with the moral authority to set agendas, choose between various good options, and keep the group on track. You may also end up electing a secretary/record-keeper, or possibly a coordinator to handle venue and food, or other offices (or you could do this after a few meetings).
Next, you’ll want to model the process that has already been working for you. Perhaps this means sharing a list of pre-existing questions, and seeing which capture the interest of your participants. Perhaps it means discussing the broader thrust of your research thus far before brainstorming some topics. Regardless, you’ll want to get down to actual thinking, writing, proving, and discussing as soon as you can. Breaking into smaller groups is often helpful if more than four people are at a meeting. If you do this, schedule a time to come back and share ideas.
Try to include breaks in your structure to keep everyone fresh. It can be difficult to remember to take a break when things get going, so it’s worth setting the intention ahead of time. Short breaks every hour in which people get up and walk around are very helpful.
It can be helpful to keep a public list (on a whiteboard or shared Google doc) of questions you have, needed concepts, and promising ideas. This is an easy source of new topics if a conversation runs dry.
One possible structure incorporating the above advice and the research procedure from the previous section:
At the end of the meeting, schedule the next event. You may have settled on a rough schedule which works for the core of the group, but you’ll still be adjusting it meeting-to-meeting to account for holidays and other absences. Confirming the next meeting time with everyone present is also important for attendance, even if the meeting times are set in stone. Make sure to establish at the outset that you’re not going to try to optimize for everyone’s availability at once; it’s good to have meetups that people feel okay skipping from time to time, as long as there’s something like 70-90% consistency in the group. If one or two people can’t make it to the second meeting, be sure to get information from them so that you can prioritize their schedules a little more when planning the third.
What follows are some half-baked, ad-hoc models of what makes for a good research group, or a good collaborative enterprise in general. You should consider all of the following to be true in spirit but false in detail, and should try to derive your own value rather than treating these as actual suggestions to follow.
3A. Transmitters and receivers
We’ve found in our own research that conversations tend to go better when they are primarily between two people. This is not to say that you shouldn’t have three or more people involved in the conversation, but in any given five-minute span of time, there should mostly be just two people talking—one who is currently trying to convey something, and another who is trying to understand (and whose understanding the first is specifically optimizing for; discussing a topic at a level such that four or five different people can all follow everything is usually worse on net).
Call these two roles the “transmitter” and the “receiver.” Things you might transmit:
Things the receiver might do:
The transmitter should feel as free as possible to just make claims, including “totally fake” claims, as long as they are keeping in touch with their intuitions; try to establish a norm where you can ask receivers to collaborate with you in uncovering the kernel of truth in what you’re saying rather than shooting down half-formed ideas because they’re still half-wrong. No matter how nonjudgemental the receivers are, it may help the transmitter to say things like “everything I’m about to say is totally wrong, but” every so often.
The transmitter should also remain in touch with their intuition and curiosity, steering the conversation to what they think is most interesting rather than trying to perform or entertain. The transmitter is under no obligation to answer the receiver’s questions; feel free to say “that’s not what I want to think about right now.”
The key idea is that the receiver is helping midwife what the transmitter is saying. In that moment, it is the transmitter’s thinking that should take priority, and the receiver is acting as a sounding board, a living intuition pump, and a source of confusion and (minor) chaos.
Meanwhile, any third parties in the audience should be trying to serve as facilitators/translators. They should be watching both the transmitter and the receiver and seeking to model what’s going on for those people. Where are they missing each other, and talking past each other? Where are they running up against confirmation bias, or the double illusion of transparency? Where are they both agreeing that something makes sense without actually understanding it?
The audience members should speak up from time to time (probably less than 10% of the total words) to inject relevant thoughts or models or questions. Sometimes, such an interjection will be the cause of a role switch, with an audience member taking on a new role as either transmitter or receiver, and one of the other parties rotating out.
3B. High standards for membership
It’s awkward to not-invite someone or to turn them away after one or two meetings, but it’s even more awkward to wreck your entire MIRIx chapter because you were too shy or too uncertain to protect it.
Have a clear distinction between “welcome to come to a meeting” and “is now a full part of the group.” Make sure that there is a known decision-maker or set of decision-makers, and empower them to make calls by fiat, without having to justify or explain. (If you don’t trust their judgment without explanation, don’t have them be part of the decision-making.) Trust your own instincts; if you don’t feel like someone is a good match for the vibe you have going, then don’t invite them in. Consider requiring multiple recommendations, or having an interview process. These may seem unnecessary, but it can be difficult to turn people away, and a formal process makes it feel more fair.
Also consider having formal ethical guidelines, or a group pledge or set of commitments, which people sign at the moment that they fully join. Make sure that any standards you set are ones you are willing to actually enforce (e.g. “you must come to half of all meetings” or “content discussed here is confidential unless otherwise stated”).
3C. Escalating asks and rewards
Consider the model of a martial arts academy. When you first arrive, the instructors ask a few small things of you (e.g. kick this target, yell out loudly when you do so). Soon, they reward you for these things with a belt and some status.
At that point, the asks escalate. Perhaps now, as a yellow belt, you are put in charge of watching some white belts for a few minutes, and correcting their form. In return, they are told to bow to you and call you “sir” or “ma’am.”
As time goes on, the asks increase, and the rewards increase commensurately. This cycle fosters commitment and investment—it’s a process of slowly proving to the individual “if I put something into this system, I will get something out of it, and the more I put in, the more I’ll get out.” Eventually, you will receive a black belt, and possibly be asked to join as a paid instructor or found your own branch of the school.
There is a similar dynamic in most groups and organizations. Groups which ask little or nothing of their members do not receive loyalty in return. Individuals feel bought-in to a group to the extent that that group allows them to tell positive or epic stories about themselves.
The same will be true of your MIRIx chapter. Consider having some small, early asks that are the same for most newcomers (e.g., read such-and-such paper, or give a ten-minute talk on a topic of interest at your third meeting). Try to build a pipeline of greater asks and rewards over time (e.g., on your fifth-ish meeting, we’d like you to take charge of setting the agenda and dividing up the groups).
3D. Structure and elbow room
Related to the previous, it’s important that you balance top-down and bottom-up structure in your MIRIx. If there’s no clear sense of “how we do things,” then newcomers will flounder and have a bad time. You want there to be a pre-existing structure that people can evaluate, to determine whether or not they feel like they fit into it. You want the “what’s this like?” of your group to be clearly visible, right from the get-go, so that both people who are well-suited to it and people who aren’t can (for the most part) accurately self-assort.
At the same time, you don’t want that structure to feel limiting or confining in the long run. Just as martial artists eventually earn the right to determine some of their own training and the ability to contribute to the agenda-setting and curriculum of newer students, so too do you want the “pie” of your MIRIx to grow as time goes on. Otherwise, people will grow frustrated by their inability to bring the fullness of their own interests and priorities, and will leave to find a better context for their own growth and research.
3E. Social norms
That which is normal and accepted is that which goes unchallenged. If there is behavior that you want to discourage, you need to make sure not only that you challenge it when it occurs, but also that you openly, vocally, and publicly support others who are challenging it. It is the job of the group to ensure that someone who is following the rules/trying to do it right is never alone when they are in conflict with someone who isn’t.
Consider in advance, and be explicit about, things like the acceptability of interruptions or off-topic discussion. Cultivate a culture of disagreement, but be deliberate about building in politeness and support so that disagreement is net-positive and doesn’t turn into abuse or delegitimization. Protect whatever decision-making structures you decide to put in place, and be consistent about what constitutes each person’s domain and what marks the end of discussion.
You’ve nearly reached the end of the document! Hopefully, this contained non-zero useful information, as well as a healthy amount of food-for-thought. Before you go, we recommend that you take 30 seconds or so to ponder each of the following questions:
- The MIRI research team
Hey Abram (and the MIRI research team)!
This post resonates with me on so many levels. I vividly remember the Human-Aligned AI Summer School where you used to be a "receiver" and Vlad was a "transmitter", when talking about "optimizers". Your "document" especially resonates with my experience running an AI Safety Meetup (Paris AI Safety).
On January 2019, I organized a Meetup about "Deep RL from human preferences". Essentially, the resources were by difficulty, so you could discuss the 80k podcast, the open AI blogpost, the original paper or even a recent relevant paper. Even if the participants were "familiar" to RL (because they got used to see written "RL" in blogs or hear people say "RL" in podcasts) none of them could explain to me the core structure of a RL setting (i.e. that a RL problem would need at least an environment, actions, etc.)
The boys were getting hungry (abram is right, $10 of chips is not enough for 4 hungry men between 7 and 9pm), when in the middle of a monologue ("in RL, you have so-and-so, and then it goes like so on and so forth..."), I suddenly realize that I'm talking to more than qualified attendees (I was lucky to have a PhD candidate in economics, a teenager who used to do international olympiads in informatics (IOI) and a CS PhD) that lack the necessary RL procedural knowledge to ask non-trivial questions about "Deep RL from human preferences".
That's when I decided to change the logistics of the Meetup to something much closer to what is described in "You and your research". I started thinking about what they would be interested in knowing. So I started telling the brillant IOI kid about this MIRI summer program, how I applied last year, etc. One thing lead to another, and I ended up asking what Tsvi had asked me one year ago for the AISFP interview:
If one of you was the only Alignment researcher left on Earth, and it was forbidden to convince other people to work on AI Safety research, what would you do?
That got everyone excited. The IOI boy took the black marker, and started to do math to the question, as a transmitter: "So, there is a probability p_0 that AI Researchers will solve the problem without me, and p_1 that my contribution will be neg-utility, so if we assume this and that, we get so-and-so."
The moment I asked questions I was truly curious about, the Meetup went from a polite gathering to the most interesting discussion of 2019.
Abram, if I were in charge of all agents in the reference class "organizer of Alignment-related events", I would tell instances of that class with my specific characteristics two things:
1. Come back to this document before and after every Meetup.
2. Please write below (can be in this thread or in the comments) what was your experience running an Alignment think-thank that resonates the most with the above "document".
The introduction was great, and the rest of the post was great, but oddly enough the introduction felt sort of wrong for the rest of the post because the rest seemed like reasonably useful things to think about as you're starting any kind of research group.
This is fantastic stuff. Nice to see others independently coming up with the transmitters and receivers model. Also, the structure mentioned in 3a resonates strongly for me with the people groping towards some sense that Circling type skills seem to be useful for rationality but couldn't quite put their finger on why. My experience is that Circling with good facilitators enables exactly the kinds of things seen in 3a.
Two things that we've found useful at QRI that may apply:
1. A slack or slack like thing (keybase is nice for the additional security) for tracking the explosion of references and conversational threads that occur when you find a generative frame/question/method set is way way more useful than things like shared gdocs. It allows more of 'getting the lay of the land' to reorient yourself when you've been away and developments have happened in the meantime. Storing links in this format also gives them a juicy sense of discovery where other formats can make them feel more like homework needed to participate in the convo.
2. Maintaining momentum in the load balancing of connections in the graph of one-on-one meetings. That is to say, groups seem to function better when there is roughly equal communication between all the participants. Probably for a variety of reasons but one major one is that it seems to allow better bootstrapping of blindspots. Crossing all the possible one-on-ones gives the chance for misunderstandings to get worked out so that people can return to much more flow-like communication patterns. This is accomplished by regularly scheduling the various one-on-ones, prioritizing them, and making it so that people can request them without feeling like it is a big ask. Generally accomplished via video chat when in person would be laborious.
I've curated this post, not only for it's value as an Alignment Research field guide but as a general guide for people setting up local research groups. (And, perhaps, other kinds of local intellectual work)
How do you review a post that was not written for you? I’m already doing research in AI Alignment, and I don’t plan on creating a group of collaborators for the moment. Still, I found some parts of this useful.
Maybe that’s how you do it: by taking different profiles, and running through the most useful advice for each profile from the post. Let’s do that.
Full time researcher (no team or MIRIx chapter)
For this profile (which is mine, by the way), the most useful piece of advice from this post comes from the model of transmitters and receivers. I’m convinced that I’ve been using it intuitively for years, but having an explicit model is definitely a plus when trying to debug a specific situation, or to explain how it works to someone less used to thinking like that.
Full time research who wants to build a team/MIRIx chapter
Obviously, this profile benefits from the great advice on building a research group. I would expect someone with this profile to understand relatively well the social dynamics part, so the most useful advice is probably the detailed logistics of getting such a group off the ground.
I also believe that the escalating asks and rewards is a less obvious social dynamic to take into account.
Aspiring researcher (no team or MIRIx chapter)
The section You and your research was probably written with this profile in mind. It tries to push towards exploration instead of exploitation, babble instead of prune. And for so many people that I know who feel obligated to understand everything before toying with a question, this is the prescribed medicine.
I want to push-back just a little about the “follow your curiosity” vibe, as I believe that there are ways to check how promising the current ideas are for AI Alignment. But I definitely understand that the audience is more “wannabe researchers stifled by their internal editor”, so pushing for curiosity and exploration makes sense.
Aspiring researcher who wants to build a team/MIRIx chapter
In addition to the You and your research section, this profile would benefit a lot from the logistics section (don’t forget the food!) and social dynamics about keeping a group running (High standards for membership, Structure and elbow room, and Social norms)
There is something here for every profile interested in AI Alignment Research. That being said, each such profile has different needs, and the article is clearly most relevant for aspiring researchers who want to build a research group.
I want to have this post in a physical book so that I can easily reference it.
It might actually work better as a standalone pamphlet, though.
This post is a great tutorial on how to run a research group.
My main complain about it is that it had the potential to be a way more general post that was obviously relevant to anyone building a serious intellectual community, but the framing makes it feel only relevant to Alignment research.
I'm currently feeling confused about whether this is the right type signature for the review, but it is a truly excellent guide to discovering new ideas together with others, better than anything I've read in its reference class.
I will say that i didn't continue reading after the first section (for the reasons specified in it), but it was an awesome introduction!