Reflections on Larks’ 2020 AI alignment literature review

by alexflint6 min read1st Jan 20218 comments


Ω 28

Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This work was supported by OAK, a monastic community in the Berkeley hills. It could not have been written without the daily love of living in this beautiful community.

Larks has once again evaluated a large fraction of this year’s research output in AI alignment. I am, as always, deeply impressed not just by the volume of his work but by Larks’ willingness to distill from these research summaries a variety of coherent theses on how AI alignment research is evolving and where individual donors might give money. I cannot emphasize enough how much more difficult this is than merely summarizing the entire year’s research output, and summarizing the entire year’s research output is certainly a heroic undertaking on its own!

I’d like to reflect briefly on a few points that came up as I read the post.


The work that I would most like to see funded is technical work that really moves our understanding of how to build beneficial AI systems forward. I will call this “depth”. It is unfortunately very difficult to quickly assess the depth of a given piece of research. Larks touches on this point when he discusses low-quality research:

[...] a considerable amount of low-quality work has been produced. For example, there are a lot of papers which can be accurately summarized as asserting “just use ML to learn ethics”. Furthermore, the conventional peer review system seems to be extremely bad at dealing with this issue.

Yet even among the papers that did get included in this year’s literature review, I suspect that there is a huge variation in depth, and I have no idea how to quickly assess which papers have it. Consider: which of the research outputs from, say, 2012 really moved our understanding of AI safety forward? How about from 2018? My sense is that these are fearsomely difficult questions to answer, even with several years’ hindisight.

Larks wisely does not fall into the trap of merely counting research outputs, or computing any other such simplistic metric. I imagine that he reads the papers and comes to an informed sense of their relative quality without relying on any single explicit metric. My own sense is that this is exactly the right way to do it. Yet the whole conclusion of the literature review does rest critically on this one key question: what is it that constitutes valuable research in the field of AI alignment? My sense is that depth is the most valuable quality on the current margin, and unfortunately it seems to be very difficult either to produce or assess.


I was both impressed and more than a little disturbed by Larks’ “research flywheel” model of success in AI alignment:

My basic model for AI safety success is this:

  1. Identify interesting problems. As a byproduct this draws new people into the field through altruism, nerd-sniping, apparent tractability
  2. Solve interesting problems. As a byproduct this draws new people into the field through credibility and prestige
  3. Repeat

I was impressed because it is actually quite rare to see any thesis whatsoever about how AI alignment might succeed overall, and rarer still to see a thesis distilled to such a point that it can be intelligently critiqued. But I was disturbed because this particular thesis is completely wrong! Increasing the amount of AI alignment research or the number of AI alignment researchers will, I suspect, by default decrease the capacity for anyone to do deep work in the field, just as increasing the number of lines of code in a codebase will, by default, decrease the capacity for anyone to sculpt highly reliable research artifacts from that codebase, or increasing the number of employees in a company will, by default, decrease the capacity for anyone in that company to get important work done.

The basic reason for this is that most humans find it very difficult to ignore noise. It is easy to imagine entering into an unweildly codebase or company or research field and doing important work while disavowing the temptation to interact with or fix the huge mess growing up in every direction, but it is extremely difficult to actually do this. It is possible to create large companies and large codebases in which important work gets done, but it is not the default outcome of growth. The large codebases and large companies that are prominent in the world today are the extreme success cases in terms of making possible important work, and, in my own direct experience, even these success cases are quite dismal on an absolute scale of allowing important work to happen.

It is not that large codebases and large companies actively prevent important work from getting done (although many do), it is that most humans find it difficult to do such work in the presence of noise. It is not enough for a large company or a large codebase to provide some in-principle workable trajectory by which important work can get done; it is a question of how many humans are actually capable of walking such a path without being constantly overwhelmed by the mess piling up around their feet.

It is not that we should try to limit the size of the AI alignment field forever. The field must grow, it seems, if we are to stand any chance of success. But we should try to walk along a careful and gradual growth trajectory that maximizes the field’s capacity for truly deep research output. While doing this we should, in my view, be clear that among all the possible trajectoriers that involve growth, most are actively harmful. We should not, in my view, be optimizing directly for growth, but instead for depth, with growth as an unfortunately but necessary by-product.

Policy, strategy, technical

Larks has this to say about publishing policy research in the AI alignment field:

My impression is that policy on most subjects, especially those that are more technical than emotional is generally made by the government and civil servants in consultation with, and being lobbied by, outside experts and interests. Without expert (e.g. top ML researchers in academia and industry) consensus, no useful policy will be enacted. Pushing directly for policy seems if anything likely to hinder expert consensus. Attempts to directly influence the government to regulate AI research seem very adversarial, and risk being pattern-matched to ignorant technophobic opposition to GM foods or other kinds of progress. We don't want the 'us-vs-them' situation that has occurred with climate change, to happen here. AI researchers who are dismissive of safety law, regarding it as an imposition and encumbrance to be endured or evaded, will probably be harder to convince of the need to voluntarily be extra-safe - especially as the regulations may actually be totally ineffective.

The only case I can think of where scientists are relatively happy about punitive safety regulations, nuclear power, is one where many of those initially concerned were scientists themselves, and also had the effect of basically ending any progress in nuclear power (at great cost to climate change). Given this, I actually think policy outreach to the general population is probably negative in expectation.

And he has this to say about publishing strategy research:

Noticeably absent are strategic pieces. I find that a lot of these pieces do not add terribly much incremental value. Additionally, my suspicion is that strategy research is, to a certain extent, produced exogenously by people who are interested / technically involved in the field. This does not apply to technical strategy pieces, about e.g. whether CIRL or Amplification is a more promising approach.

I basically agree with both of these points, which I would summarize as: Direct engagement with AI policymakers is helpful, but there are not many compelling reasons to publish AI policy work, since the main reason to publish such work would be broad outreach, and broad outreach on AI policy is probably harmful at this point due to the risk of setting up an adversarial relationship with AI researchers. Although high-quality strategy research exists, as an empirical observation it is just quite rare to read strategy research that truly moves one’s understanding of the field forward.

My own out-take from these helpful points is as follows: in order to do beneficial work in general, and particularly in order to do beneficial work within AI alignment, begin by working directly on the very core of the problem, using your current imperfect understanding of what the core of the problem is and how to work on it. In AI alignment, this might be: begin by working, as best you can, on the core challenge of navigating the development of advanced AI. In doing so, you may discover that the core of the problem is actually not where you thought it was, in which case you can shift your efforts, or you may discover some neglected meta-level work, in which case you may then decide whether to undertake that work yourself. But in such a complex landscape, if you don’t begin earnestly investigating the nature of and solution to the core of the problem then any other work you do is unlikely to be overall beneficial. This is the same “depth” I was trying to point at in the preceding sections.

Scalable uses for money

Larks encodes his conclusions by rotating each letter 13 places forward in the alphabet, in order to discourage us from merely reading his conclusions without engaging directly with the challenging task of formulating our own:

My constant wish is to promote a lively intellect and independent decision-making among readers; hopefully my laying out the facts as I see them above will prove helpful to some readers.

I very much admire this ethos, and will do my best not to undo his efforts, although I do want to comment on one general point mentioned in the encoded text. Larks notes that much of the best research is being conducted within large organizations that already have ample funding, and are neither accepting of nor in need of additional funding at this time. This is both heartening and distressing.

It is heartening, of course, to see important research being funded at such a level that in at least several prominent cases further funding by individual donors is literally impossible, and in several additional cases seems to be explicitly un-sought after by the organizations themselves.

But it is also a little distressing that after 20 years of work in AI alignment (counting from the date that MIRI, then the Singularity Institute for Artificial Intelligence, was founded), we have neither a resolution to the AI alignment problem nor any scheme for scalably utilizing funds to find one. What would a scalable scheme for resolving the AI alignment problem look like, exactly? Is depth scalable? If not, then why exactly is that?

These are questions about which I would very much like to have one-on-one conversations. If you would like to have such a conversation with me, please send me a direct message here on lesswrong.


May you find happiness and depth in your work.

May you find a way to live that truly supports you.

May your life and work come together beautifully.

May you bring peace to our troubled world.


Ω 28

8 comments, sorted by Highlighting new comments since Today at 8:09 AM
New Comment

I agree with this post.

You gave me food for thoughts. I hadn't thought about your objection about growth (or at least pushing for growth). I think I disagree with the point about strategy research, since I believe that strategy research can help give a bird's eye view of the field that is harder to get when exploring.

Yeah so to be clear, I do actually think strategy research is pretty important, I just notice that in practice most of the strategy write-ups that I actually read do not actually enlighten me very much, whereas it's not so uncommon to read technical write-ups that seem to really move our understanding forward. I guess it's more that doing truly useful strategy research is just ultra difficult. I do think that, for example, some of Bostrom's and Yudkowsky's early strategy write-ups were ultra useful and important.

There are two basic ways to increase the number of AI Safety refreshers.
1) Take mission aligned people (usually EA undergraduates) and help then gain the skills.
2) Take a skilled AI researcher and convince them to join the mission.

I think these two types of growth may have very different effects. 

A type 1 new person might take some time to get any good, but will be mission aligned. If that person looses sight of the real problem, I am very optimistic about just reminding them what AI Safety is really about, and they will get back on track. Further more, these people already exist, and are already trying to become AI Safety researches. We can help them, ignore them, or tell them to stop. Ignoring them will produce more noise compared to helping them, since the normal pressure of building academic prestige is currently not very aligned with the mission. So do we support them or tell them to stop? Actively telling people not to try to help with AI Safety seems very bad, it is something I would expect to have bad cultural effects outside just regulating how many people are doing AI Safety research.

A type 2 new person who are converted to AI Safety research becasue they actually care about the mission is not to dissimilar from a type 2 new person, so I will not write more about that.

However there is an other type of type 2 person who will be attracted to AI Safety as a side effect of AI Safety being cool and interesting. I think there is a risk that these people takes over the field and diverts the focus completely. I'm not sure how to stop this though since this is a direct side effect of gaining respectability, and AI Safety will need respectability. And we can't just work in the shadows until it is the right time, because we don't know the timelines. The best plan I have for keeping global AI Safety research on course, is to put as many of "our" people in to the field as we can. We have a founders effect advantage, and I expect this to get stronger the more truly mission aligned people we can put into academia. 

I agree with alexflint, that there are bad growth trajectories and good growth trajectories. But I don't think the good one is as hard to hit as they do. I think partly what is wrong is the model of AI Safety as a single company. I don't think this is a good intuition pump. Noise is a thing, but it is much less intrusive that this metaphor suggest. Someone at MIRI told me that to first approximation he don't read other peoples work, so at least for this person, it don't matter how much noise is published, and I think this is a normal situation, especially for people interested deep work.

What mostly keep people in academia from doing deep work is the pressure to constantly publish. 

I think focusing on growth v.s. not growth is the wrong question. But I think focusing on deep work is the right question. So let's help people do deep work. Or, at least that what I aim to do. And I'm also happy to discuss with anyone.

Thank you for this thoughtful comment Linda -- writing this replying has helped me to clarify my own thinking on growth and depth. My basic sense is this:

If I meet someone who really wants to help out with AI safety, I want to help them to do that, basically without reservation, regardless of their skill, experience, etc. My sense is that we have a huge and growing challenge in navigating the development of advanced AI, and there is just no shortage of work to do, though it can at first be quite difficult to find. So when I meet individuals, I will try to help them find out how to really help out. There is no need for me to judge whether a particular person really wants to help out or not; I simply help them see how they can help out, and those who want to help out will proceed. Those who do not want to help out will not proceed, and that's fine too -- there are plenty of good reasons for a person to not want to dive head-first into AI safety.

But it's different when I consider setting up incentives, which is what @Larks was writing about:

My basic model for AI safety success is this: Identify interesting problems. As a byproduct this draws new people into the field through altruism, nerd-sniping, apparent tractability. Solve interesting problems. As a byproduct this draws new people into the field through credibility and prestige.

I'm quite concerned about "drawing people into the field through credibility and prestige" and even about "drawing people into the field through altruism, nerd-sniping, and apparent tractability". The issue is not the people who genuinely want to help out, whom I consider to be a boon to the field regardless of their skill or experience. The issue is twofold:

  1. Drawing people who are not particularly interested in helping out into the field via incentives (credibility, prestige, etc).
  2. Tempting those who do really want to help out and are already actually helping out to instead pursue incentives (credibility, prestige, etc).

So I'm not skeptical of growth via helping individuals, I'm skeptical of growth via incentives.

Ok, that makes sense. Seems like we are mostly on the same page then. 

I don't have strong opinions weather drawing in people via prestige is good or bad. I expect it is probably complicated. For example, there might be people who want to work on AI Safety for the right reason, but are too agreeable to do it unless it reach some level of acceptability. So I don't know what the effects will be on net. But I think it is an effect we will have to handle, since prestige will be important for other reasons. 

On the other hand, there are lots of people who really do want to help, for the right reason. So if growth is the goal, helping these people out seems like just an obvious thing to do. I expect there are ways funders can help out here too. 

I would not update much on the fact that currently most research is produced by existing institutions. It is hard to do good research, and even harder with out collogues, sallary and other support that comes with being part of an org. So I think there is a lot of room for growth, by just helping the people who are already involved and trying.

I very much agree with these two:

On the other hand, there are lots of people who really do want to help, for the right reason. So if growth is the goal, helping these people out seems like just an obvious thing to do

So I think there is a lot of room for growth, by just helping the people who are already involved and trying.

I agree with the thesis but suspect a slightly different mechanism. I don't think people have trouble ignoring noise at an epistemic level - I think people have other reasons for paying lip-service to genre-aligned content independent of epistemic content, and so noise creates a conflict of interest.

This suggests another possible approach in sharpening the incentives to preferentially reward epistemic content - rather than flat out reducing variance which has other negative side effects (eg fragility).