Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Many technical alignment researchers are bad-to-mediocre at writing up their ideas and results in a form intelligible to other people. And even for those who are reasonably good at it, writing up a good intuitive explanation still takes a lot of work, and that work lengthens the turn-time on publishing new results. For instance, a couple months ago I wrote a post which formalized the idea of abstractions as redundant information, and argued that it’s equivalent to abstractions as information relevant at a distance. That post came out about two months after I had the rough math worked out, because it took a lot of work to explain it decently - and I don’t even think the end result was all that good an explanation! And I still don’t have a post which explains well why that result is interesting.

I think there’s a lot of potential space in the field for people who are good at figuring out what other researchers’ math is saying intuitively, and why it’s interesting, and then communicating that clearly - i.e. the skill of distillation. This post will briefly sketch out what two kinds of distillation roles might look like, what skills are needed, and talk about how one might get started in such a role.

Two Distiller Roles

The two types of distiller role I’ll sketch are:

  • “Independent” distiller: someone who works independently, understanding work published by other researchers and producing distillations of that work.
  • “Adjunct” distiller: someone who works directly with one researcher or a small team, producing regular write-ups of what the person/team is thinking about and why.

 These two roles add value in slightly different ways.

An independent distiller’s main value-adds are:

  • Explaining the motivation and intended applications
  • Coming up with new examples
  • Boiling down the “key intuitive story” behind an argument
  • Showing how the intuitive story fits into the context of the intended applications

I expect the ability to come up with novel examples and boil down the core intuitive story behind a bunch of math are the rate-limiting skills here.

Rob Miles is a good example of an existing independent distiller in the field. He makes YouTube videos intuitively explaining various technical results and arguments. Rob’s work is aimed somewhat more at a popular audience than what I have in mind, but it’s nonetheless been useful for people in the field.

I expect an adjunct distiller’s main value-adds are:

  • Writing up explanations, examples, and intuitions, similar to the independent distiller
  • Saving time for the technical researcher/team; allow more specialization
  • Providing more external visibility/legibility into the research process and motivation
  • Accelerating the research process directly by coming up with good examples and intuitive explanations

I expect finding a researcher/team to work with is the rate-limiting step to this sort of work.

Mark Xu is a good example of an existing adjunct distiller. He’s worked with both Evan Hubinger and Paul Christiano, and has written up decent distillations of some of their thoughts. I believe Mark did this with the aim of later doing technical research himself, rather than mostly being a distiller. That is a pretty good strategy and I expect it to be a common pathway, though naturally I expect people who aim to specialize in distillation long-term will end up better at distillation.

What Kind Of Skills Are Needed?

I expect the key rate-limiting skills are:

  • Ability to independently generate intuitive examples when reading mathematical arguments, or having a mathematical discussion
  • Ability to extract the core intuitive story from a mathematical argument
  • Writing/drawing skills to clearly convey technical intuitions to a wider audience
  • Ability to do most of the work of crossing the communication gap yourself - both so that researchers do not need to spend a lot of effort communicating to you, and so that readers do not need to spend a lot of effort understanding you
  • For the adjunct role, ability to write decent things quickly and frequently without too much perfectionism
  • For the non-adjunct role, ability to do all this relatively independently

How To Get Started

Getting started in an independent distiller role should be pretty straightforward: choose some research, and produce some distillations. It’s inherently a very legible job, so you should pretty quickly have some good example pieces which you could showcase in a grant application (e.g. from the Long Term Future Fund or FTX Future Fund). That said, bear in mind that you may need some practice before you actually start to produce very good distillations.

An adjunct role is more difficult, because you need someone to work with. Obvious advice: just asking people is an underutilized strategy, and works surprisingly well. Be sure to emphasize your intended value-add to the researcher(s). If you want to prove yourself a bit before reaching out, independently distilling some of a researcher’s existing public work is another obvious step. You might also try interviewing a researcher on some part of their work, and then distilling that, in order to get a better feel for what it would be like to work together before actually committing.


Ω 54

33 comments, sorted by Click to highlight new comments since: Today at 3:54 AM
New Comment

I want to really signal boost an idea that's come up a bit recently: doing user research on pieces of writing. When writing such distillations, have someone in the target audience read it and sit down next to them (or on a video call) and watch them do so. Try to get them to think out loud. Observe what works and what doesn't. I suspect rather strongly that this is a low hanging fruit.

What posts of yours do you want distilled?

(Ordered by priority)

  1. The Pointers Problem and Variables Don't Represent The Physical World
  2. The three posts on Selection Theorems could generally use some distillation and better marketing; the "selection theorems" name is quite bad, and the empirical aspects should be emphasized more
  3. There's been a few posts and a lot of comments on those posts between myself, Evan, and Abram arguing about the right way to think of "outer" vs "inner" alignment. (My comment on that last linked post is the best current summary of my thoughts.)
  4. How To Think About Overparameterized Models. Also a distillation of the relevant parts of the Mingard et al work would go well with this.
  5. My review of Coherent Decisions
  6. Abstractions as Redundant Information
  7. Anything in the Big Picture of Alignment talks
  8. Generalized Koopman-Pitman-Darmois is probably a very hard one to distill, but it would probably be valuable if someone could explain the argument more intuitively. (Really, the right way to do it is to figure out a proof which explicitly routes through an entropy maximization problem, but that's more a technical goal than a distillation goal.)

More generally, two big categories:

  • I've written a ton of material on general background world models, and a ton of material on alignment, but I've written relatively little explaining how the background world models narrow down the search-for-alignment-progress to the sort of work I'm doing. Or, to put it a different way: a lot of my technical posts could use good explanations of why the results are interesting and how they fit into the big picture.
  • Important stuff is often buried in comment threads. I'm not sure if LW currently has a way to rank a user's comments by Karma, but that would be useful to find such threads.

Finally, since it's Thomas Kwa asking this question: no, I am not going to create bounties for distillations on these right now, because I don't want to deal with the overhead. Fortunately, the target audience for distillations of my work is everyone except me, so people other than me are quite well qualified to set up their own distillation bounties.

I think I weakly disagree with the implication that “distillation” should be thought of as a different category of activity from “original research”. It is in a superficial sense, but a lot of the underlying activities and skills and motivations overlap. For example, original researchers also have the experience of reading something, feeling confused about it, and then eventually feeling less confused about it. They just might not choose to spend the time writing up how they came to be less confused. Conversely, someone trying to understand something for the purpose of pedagogy may notice a mistake in the original, or that the original is outright wrong, which is original research.

I guess if I were writing something-like-this-post, I would frame it as:

  1. I encourage grant-makers to be impressed by people for creating good pedagogy even if it's technically unoriginal. (I suspect that this is already the case.)
  2. I encourage anyone who has the experience of reading something, feeling confused about it, and then eventually feeling less confused about it, to create some piece of pedagogy that would have helped their former selves; for example, this is an excellent type of project for people trying to get into the field.
  3. I encourage active researchers doing original research to also consider whether pausing to create better pedagogy would be a good use of time, even at the expense of slowing down their own novel research progress.
  4. I encourage anyone who feels very confused about something-in-particular to post calls / bounties / whatever for pedagogy on that topic.

(Maybe other things too.)

For my part I've spent much of the last five months on a #3 project, and I think that was the right call for my particular situation—I suspect that I learned more through writing those things than anyone else will by reading them. I also spent the better part of a month on a #2 project, and also found it a good use of time. And the very first thing I did when I decided to learn about the field was spend a few months creating unoriginal pedagogy. It was a great way to learn. :)

I agree i.e. I also (fairly weakly) disagree with the value of thinking of 'distilling'  as a separate thing. Part of me wants to conjecture that it's comes from thinking of alignment work predominantly as mathematics or a hard science in which the standard 'unit' is a an original theorem or original result which might be poorly written up but can't really be argued against much. But if we think of the area (I'm thinking predominantly about more conceptual/theoretical alignment) as a 'softer', messier, ongoing discourse full of different arguments from different viewpoints and under different assumptions, with counter-arguments, rejoinders, clarifications, retractions etc. that takes place across blogs, papers, talks, theorems, experiments etc that all somehow slowly works to produce progress, then it starts to be less clear what this special activity called 'distilling' really is. 

Another relevant point, but one which I won't bother trying to expand on much here, is that a research community assimilating - and then eventually building on - complex ideas can take a really long time. 

[At risk of extending into a rant, I also just think the term is a bit off-putting. Sure, I can get the sense of what it means from the word and the way it is used - it's not completely opaque or anything - but I'd not heard it used regularly in this way until I started looking at the alignment forum. What's really so special about alignment that we need to use this word? Do we think we have figured out some new secret activity that is useful for intellectual progress that other fields haven't figured out? Can we not get by using words like "writing" and "teaching" and "explaining"?]

I think I weakly disagree with the implication that “distillation” should be thought of as a different category of activity from “original research”.

(I might be wrong, but) I think there is a relatively large group of people who want to become AI alignment researchers that just wouldn't be good enough to do very effective alignment research, and I think many of those people might be more effective as distillers. (And I think distillers (and teachers for AI safety) as occupation is currently very neglected.)

Similarly, there may also be people who think they aren't good enough for alignment research, but may be more encouraged to just learn the stuff well and then teach it to others.

I was about to write approximately this, so thank you! To add one point in this direction, I am sceptical about the value of reducing the expectation for researchers to explain what they are doing. My research is in two fields (arithmetic geometry and enumerative geometry). In the first we put a lot of burden on the writer to explain themselves, and in the latter poor and incomplete explanations are standard. This sometimes allows people in the latter field to move faster, but

  • it leaves critical foundational gaps, which we can ignore for a while but which eventually causes lot of pain;
  • sometimes really critical points are hidden in the details, and we just miss these if we don’t write the details down properly. Disclaimers:
  • while I think a lot of people working in these fields would agree with me that this distinction exists, not so many will agree that it is generally a bad thing.
  • I’m generally criticising lack of rigour rather than lack of explanation. I am or claiming these necessarily have to go together, but in my experience they very often do.

Looking for "distillers" / happy to pay for this work.

Also distillation seems like a wrong name.  What's often needed seems to be more like dilution & blending  - I can often describe core of an idea by a few sentences, but the inferential steps required from the reader are then too large, or rely on knowledge unknown to many readers.


  • distillation: because the blog post should be shorter than the main paper.
  • blending: Because we need to reduce the inferential gap by explaining the prerequisites.
  • dilution:  a corollary of the two previous ones

Bit of a pointless gripe (and not too specific to this post), but I wish we could use the word “pedagogy” instead of “distillation”. Not only is “pedagogy” much more understandable to lots of potential readers, but I don't think “distill” is even capturing a helpful mental image. “Distillation” in chemistry is getting rid of all the excess to make something super-concentrated. But that's often the opposite of good pedagogy! Imagine (1) a dense math proof with almost no English words, (2) an explanation of the same proof with lots of examples and diagrams and stories and intuitions, and explaining the same thing multiple times from multiple perspectives, etc. When I imagine “distillation” in the original (chemistry) sense, to me it invokes a mental image much closer to (1) not (2). But (2) is better pedagogy, and (2) is what we actually want in this context.

(For example, Rob Miles videos are not designed to pack in the maximum possible number of concentrated insights per second of video.)

FYI there are three different words I need, and I don't know what the proper... conjugation? is for pedagogy:

  • Distiller (person who does the job of streamling/clarifying)
  • Distillation [1] (an instance of a distilled work)
  • Distillation [2] (the general topic of creating distillations. This is the one I think "pedagogy" means)

The first one could be called "a teacher", but, that has different connotations when you're writing things down or whatnot.

  • Umm, popularizer, educator, clear writer, lucid writer, explainer, expositor, educator, blogger, Eliezer-whisperer (=Rob)…?
  • Explanation, Popularization, "X for dummies", "introduction to X", "conceptual introduction to X", "mathematical introduction to X", "introduction to X for economists", "ELI12: X", "textbook", "lecture series", "Voxsplainer", "Blog post on X", "Semitechnical introductory dialog on X"…?
  • Yeah, I guess "pedagogy".


Also "communicator", e.g. "science communicator".

I think distillation can describe the process very well from the right perspective. Imagine an academic paper in physics or math with pages of bunch of formulae. Obviously, the average person isn't going to understand them, so a distiller would take out all the jargon and specialized knowledge to leave behind the main ideas. 

Funny timing seeing Scott Alexander's Yudkowsky Contra Christiano On AI Takeoff Speeds post so soon after this post. My god is he good. I wonder what the implications of that are.

My current plan is to go through most of the MIRI dialogues and anything else lying around that I think would be of interest to my readers, at some slow rate where I don't scare off people who don't want to read too much AI stuff. If anyone here feels like something else would be a better use of my time, let me know.

Coming up with good examples strikes me as the sort of thing that is a better fit for crowdsourcing than for an individual to try to think up themself.

In practice, I understand there is a bit of a catch-22 though: a published post is the time when such crowdsourcing is most practical to occur, but you'd probably want have the good examples be included in that published post. I'm not sure what to do about this.

One idea is to post something like "I'm working on a post for X and am looking for good examples of Y." But that's hard because it very well might be hard to explain X without having really written up X (depending on the X). A second idea that I like more is publishing rough drafts, or even more early stage writeups, and getting help with examples there. That's an idea I am a fan of anyway though, forgetting about the benefit of it helping with the generation of examples.

Curated. I think this is a message that's well worth getting out there, and a write-up of a message I find myself telling people often. As more people are interested in joining the Alignment field, I think we should establish this is a way that people can start contributing. A suggestion here is that people can further flesh out LessWrong wiki-tag pages on AI (see the concepts page), and I'd be interested in building further framework on LessWrong to enable distillation work.

This sounds like something that could be done as an organization creating a job for it, which could help with mentorship/connections/motivation/job security relative to expecting people to apply to EAIF/LTFF

My organization (Rethink Priorities) is currently hiring for research assistants and research fellows (among other roles) and some of their responsibilities will include distillation.

I'd be excited to see more of this happening.

It reminds me of the recent job posting from Abram, Vanessa and Diffractor, which seems to be a role of adjunct distiller for Infrabaysianism, though they use different terms.

A quadratic funding mechanism (similar to Gitcoin) could make sense for putting up distillation bounties. Quadratic funding (QF) lets a grant-maker put up a pool of matching funds while individual researchers specify how much each individual bounty would be valuable to her/him; then the matching is done via the QF strategy to optimize for aggregate researcher utility. Speaking for myself, I would contribute to a community fund for further distillations, and I would also be more likely to distill.

I find the level of distillation done by Daniel Filan at AXRP to be great. Short and listenable enough for easy access while detailed enough to let you form opinions/directions for further research.

Some IMO valuable targets for distillation: Infra-Bayesianism and recent work on why deep nets converge e.g. 'Gradient Descent Finds Global Minima of Deep Neural Networks', Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers, A Theoretical Analysis of Deep Q-Learning

One way to define such a bounty would be to define them via alignment forum karma. You could publish a list of papers to be distilled and define how much you would pay for an alignment forum post with 10, 25, and 50 alignment forum karma on the topic. 

I currently receive funding from LTF Fund for translating AI-Safety-related texts into Russian.

My partner (she's also my editor) is interested in something very similar to what this post says. The last year she often says something like "Your rationalist guys are definitely very smart, but they are very bad at speaking human language, so no one understands them". 

We actually thought, that after finishing my grant work (so that she edits everything that I translated, and becomes well versed in the topic) in July, she will send her own grant application with plan "she writes easy-to-understand texts about AI-Safety, hired translator translates them into English (I'm good at English-Russian translations, but not vice-versa), I check that translator doesn't distort the meaning."

She has a lot of experience with writing short texts: she runs a group in social media with ~20000 subscribers 7 years.

Should we really do this, what do you think? 

There's some trickiness around vetting translation work, but pairing it with distillation work seems like it'd make it easier to vet. 

I do anticipate an issue where, attempting to simplify something to be more readable loses important nuance, so I think it'd be good to have another person in the loop.

I'm pretty confused that this is as necessary as it is, particularly with writing that involves a lot of math and math notation. I don't understand how people get the insight and motivation necessary to write that kind of thing without explaining what the point of it is or giving examples of how to apply it as part of their expositions.

Your (johnswentworth's) posts don't seem to suffer nearly as much from this, at least from a quick skim of the ones you said on another thread you'd like distilled, but e.g. Infra-Bayesianism seems maybe important (it's about something I was thinking about anyway), but I've "bounced off" the sequence, as apparently have enough others for there to be a job offering distilling it. The same thing happens sometimes when I'm reading academic papers.

I have nothing against rigor and I actually enjoy math, but when writing about models or systems one wants to apply to something that isn't purely mathematical, these are the basic parts I generally see and how I react to them:

  • Motivation. This is great! Tell me what problem you're trying to solve and sketch your solution. People tend to do this only at the beginning of a piece, if they do it at all, which in my opinion is a mistake. I want to know why you care about proving that theorem, and if you don't say I can't always guess.

  • Examples. These are great! If you introduce a construct, show me one. Make it just non-trivial enough that there's a point to using the abstraction at all. Ideally make its components things I can already recognize and understand without the abstraction. Again, there often are some but not enough; any significant point made that isn't illustrated with an example forces me to come up with one on my own or else I just won't understand it.

  • Text description of math. This varies a lot. If it's showing the correspondence between the real problem being solved and the abstractions and notation used, that's fine. Other statements relating mathematical objects to one another are okay if I already have a good grasp of what those objects are (less so if some or all of them were only introduced in the current piece and are non-obvious in terms of the underlying constructs). Proofs written in text tend to frustrate me. They require extreme amounts of thought per sentence because they're written very elliptically (stating "X has property Z" while assuming I remember "X has property Y" and "property Y implies property Z", often when neither of these latter things is stated in the piece at all), and when I do figure out what they're saying, 90% of it is extremely obvious and uninteresting, but the 10% that makes the proof work isn't highlighted in any way. I largely end up skipping them, and since it seems like they're probably one of the hardest parts to write, that feels like a big waste.

  • Symbolic math. There's no other way to be so precise so concisely, but I find myself having to refer back to definitions constantly to figure out what they're saying, and long blocks of nothing but symbols are even more impenetrable than prose proofs; I end up skipping such blocks if there's any way I can figure out what's being said without decoding them. In programming we all know single letter variable and function names make code hard to understand (with a few common exceptions like naming a loop index "i"); I think symbolic math would be much easier to read if it looked more like a computer program.

Now, I do also see writing that's almost all motivation and examples, and that doesn't seem right either. It doesn't create sufficient technical understanding that I could implement the ideas in a computer program, or adapt them to very different cases than the ones presented. But to summarize, some skills you say are important in distilling are:

  • Ability to independently generate intuitive examples when reading mathematical arguments, or having a mathematical discussion
  • Ability to extract the core intuitive story from a mathematical argument

And, well, I honestly don't understand how anyone can even think about math, and particularly not how they can come up with useful math, without having examples and an intuitive story in mind already.

Are there examples or best practices you would recommend for this?

Should the role of a distiller include spotting mistakes? I assume that you'd only want distillers to get to work once you have some confidence that the original claims are correct. 

What you're looking for here is someone in a field called technical writing and/or translation analysis. Technical writers and translation analysts can have different specialties. Someone who is mathematics specialized could do the work you describe.

This is a growing and well-paying field for people with excellent writing skills to work in the tech, engineering, medical, etc fields without having much technical training/degrees.

This is extremely articulate and frankly universally applicable. Distillers are needed at any critical point in a business when you’re trying to understand what you have and what to do with it aka decision intelligence. Anyone able to effectively and efficiently “distill” the critical, actionable nuggets which drive outcomes are the most valuable resource in a company!

What is the intended difference between a distiller and the field of science communication? 

In practice, "science communicators" tend to be popsci-y; there's often a focus on entertainment over epistemics, and they'll often end up misleading or outright wrong as a result (often accidentally). A distiller's job is more centrally about deeply understanding the ideas themselves, and then communicating the core pieces accurately. Their path-to-impact is through providing useful explanations to researchers, not entertaining laypeople.

I consider myself a fairly good distiller, though fairly bad at math (compared to the average person here). So far I’ve mostly used that skill to write biographies on Wikipedia, but it might be possible for me to work on more math-heavy topics if I either have enough time to learn the math in depth or have a strongly collaborative relationship with someone who already does.