Epistemic status: I haven’t thought about most of this for more than a few hours. Where I quote people directly, that’s exactly the quote, otherwise assume this is just my, possibly flawed understanding, of limited interaction with others.  Practising thinking well and sharing some advice/observations from others seems like it might be useful. 

Thanks to Michael Chen and Viktor Rehnberg for comments.

I recently attended my first EA Global and wanted to follow up with a post. I seem to be falling into a pattern of writing up my thoughts and not publishing them, so this is an attempt to lower my standards around the quality of ideas and write up some points. 

This post is mostly about high-level AI safety stuff. I might write another post on my thoughts related to biosecurity which is something else I thought about heavily at the conference. Many of my 1:1s were about lower-level details of upskilling in alignment, x-risk from biosecurity and people I could assist. 

Why think about Bottlenecks? The Theory of Constraints

The Theory of Constraints (TOC) is an idea I've been fascinated with since reading The Phoenix Project. Central to this theory is the concept of a bottleneck. A bottleneck exists where increasing capacity at the bottleneck/constraint, and only there, will cause an increase in the flow through the process. It's not a complicated model, but I've found it helpful in other domains. 

For alignment, I might describe my understanding of the goal as "robust alignment strategies that might work on near-future AGI". 

Initial Hypothesis: We need more good ideas!

Before attending EAG DC, my central hypothesis was that to make more progress on AI alignment; we needed to have more good ideas. I agree if this doesn't sound like much of a model to you. Writing things down makes that feel more obvious. 

To be generous, the prediction here is that we can scale many other activities up (like researching specific agendas), but that may not help unless we have some game-changing insight which will only happen if more, possibly exceptional, people get involved with the field after being trained to think-well. This opinion would explain many initiatives, but I don't want to ascribe it to others.

By contrast, I didn't have the hypothesis that we should focus more on AI governance. I hadn't investigated AI Governance before the conference. I raise this to highlight that there are possible models which predict bottlenecks such that the same activities (outreach/mentoring/funding) might not be sufficient on their own to avoid humanity's doom. 

If we think that idea quality is bottlenecking alignment, then various current initiatives might work to alleviate it. 

We could:

  • Add people/upskill more people (RefineSERI-MATSMLABAGISF)
  • Help people think better (why Eliezer wrote The Sequences)
  • Help people communicate better, which is instrumentally helpful in thinking better (Alignment Forum, Less Wrong)
  • Fund more research (OpenPhilLTFF and others)

However, many other possible bottleneck hypotheses might predict the alignment field to be more productive due to these activities. Any bottleneck alleviated by EA having more people could be alleviated by growing the field. 

More sophisticated model: The Alignment Field is constrained by being too abstract, and the new researcher generation might be constrained by mentorship

I went to the AI Alignment careers discussion, where Richard Ngo and others advised people eager to get involved with alignment. I had two main takeaways from Richard's comments during that session:

  • We need to take abstract ideas and test them by making them more concrete/implementing them. 
  • There need to be more experienced Alignment Researchers to mentor the considerable number of aspiring alignment contributors who could benefit from mentoring. 

Other people at EAG DC tell me that writing up ideas on LessWrong or the Alignment forum is good, and so is writing distillations, and Richard wasn't the only one to say that people must take the time to test ideas. This might enable the field to gain more consensus about which ideas are leading us towards robust solutions and which ideas won't. 

Placing these concepts in the context of the TOC framing, we can separate two processes (or steps in a more extensive process):

  • The process of getting more people into alignment -> bottlenecked by mentorship. 
  • The process of getting the alignment field to be more productive -> bottlenecked by our ability to test our ideas. 

I think the people/researcher generating process is clearly an input in the "productive research strategy" generating process. However, that's different from what TOC predicts. In the theory of constraints, we would either expect to say "we are bottlenecked by having people" or "we are bottlenecked by having good ideas". The point of a bottleneck is that improvements elsewhere don't help. 

Understanding this, my first inclination is to step away from TOC. Both can be true. There must be a positive gradient in productive research from adding more people and improving our thinking. Then, where is the greatest gradient? 

I think Richard's point here that we need to test abstract ideas by finding more concrete ways to apply them seems like a really good candidate for better thinking. A model behind that statement might say:

  • We worked on conceptual stuff for a long time
  • There's probably some free energy in taking that stuff and applying it now that LLMs are getting really powerful*
  • Let's work on that, and new researchers might find that doing that concretization work is good for a bunch of reasons (helps them learn, familiarizes them with existing ideas). 

*A commenter asked me to provide an example here and I’m nervous about picking the wrong one given my current level of understanding. Richard suggested Jacob Steinhardt and Dan Hendrycks might be good examples of people who have done this. I aim to ask them for their thoughts in the near future. 

AI Governance: Are we adequately sceptical?

I should write about this since the AI governance discussions presented new information, which I'm still processing. I had yet to consider AI governance as a topic before EAG DC. I was aware of the AGISF governance track but read the readings from the technical track myself. 

Anyway, so day 1 or 2 of the conference and someone shows me this slack from Eliezer on the "ai-governance-and-policy" slack channel:

Hi. I'm Eliezer Yudkowsky. I've been working in AGI alignment since 2001 and framed most of the conversation. My model of how AI policy works is that everyone in the field is there because they don't understand which technical problems are hard, or which political problems are impossible, or both.

If you're in AI policy, have real-world experience in politics, and you're under the impression that AGI ruin is solvable and that we're not all going to die, it's probably because you have a bad model of the technical side of the problem. Consider talking to me about that.

If you're in AI policy, and you don't have real-world experience in politics to tell you what's not inside the Overton Window and which bright-eyed dreams of government policy don't work out in real life, i.e, no we're not going to have a massively multilateral international agreement to monitor all GPU usage and prevent all AGI projects as is actually effective and gets enforced, please leave the AI policy field and go get some real-world experience at miserably failing to do something that ought to be vastly simpler and easier, like a universal and effective ban on gain-of-function research in biology.

It had mixed reviews (mainly in emojis), and there were replies from lots of much more qualified people responding. The first reply asked what the post aimed at achieving. Eliezer responded:

Get people with political experience to talk to me about the technical landscape so they know why the problem is hard. Get political optimists to talk to people with political experience though I expect this will somehow fail to reach them. Get people with political experience to be much louder about what you cannot do in real life, so that discussion isn't dominated by optimists.

While you could say this is a bit blunt, the points made here are valuable. I had several conversations about this post, and in several of them, I ended up defending the governance-cynical perspective. 

Here are my thoughts:

  • In a world where we somehow survive AGI, we probably were sufficiently cynical about naive strategies that we left room for promising strategies to rise above the fray. We also might be careful to refrain from investing in bandaid measures or efforts unlikely to succeed in conserving our energy. 
  • The previous point in probabilistic terms: P(Optimism| good model) ~ 0, P(Optimism| bad model) ~ U(0.3,1) (or something like that) in the current state of affairs. Maybe there will be a reason for people with good models to think we’re not all going to die but if everyone keeps saying they’re confident we’re not all going to die then if/when we get that evidence it won’t look like a big update. This will make it much harder to redirect resources if/when that happens. 
  • Eliezer offers to talk about his ideas with people. It's not a mic-drop. It's a mic-pickup. This post is the first time I've seen a leader with controversial ideas in any other context offer to engage directly with a community. 
  • Interestingly, Eliezer uses as an example of an easier, more tractable political task, "a universal and effective ban on gain-of-function research in biology" which I have huge priors on being super hard and maybe impossible. Maybe some Dunning-Kruger calibration could be invoked here. I have no idea how hard effective AI governance would be, but it is more challenging than gain-of-function governance, of which I am very sceptical. 

I went to one of the governance discussion sessions and thought some proposals sounded valuable. Much like how it seems we have yet to find any technical AI alignment strategies that feel adequately robust, I didn't think the governance strategies sounded particularly robust. On the other hand, we may need to walk before we can run. 

For example, track the GPUs via the internet. Great. We just hooked up our AGI to the internet. 

Another example suggestion was "only let responsible actors have huge numbers of GPUs". Who are the responsible actors? If it's not a company meeting standards that most companies haven’t met yet, we're probably toast. 

Until we have either a very different governance landscape or much more actionable technical solutions, it’s hard to see AI governance as being hugely critical. However, Richard Ngo has clearly thought a lot about alignment and he’s spending time there so I’ll try to keep an open mind until I can pass an ideological Turing test

Updates

In concrete terms, I think my updates look something like this:

The TOC model seems less useful than I thought. Its attractiveness might correspond to the niceness of problems that can be simplified in that way which means it's excellent if it applies, but I wonder if it does. 

  • Being bottlenecked by mentors feels real. Getting people mentorships in academia or industry seems like something worth doing to speed up alignment community growth.
  • Being bottlenecked by abstract ideas feels less real. Maybe finding good ways to make abstract ideas concrete is a consequence of having good ideas which people are aiming for anyway? If they aren’t aiming for that, knowing this would probably be an update for me. 

I didn't expect to evaluate governance because I'd given it no thought. I have updated heavily towards "people think there might be stuff there" but also mildly away from governance being valuable.

In the TOC framing, I suspect the real bottlenecks are far upstream of governance bottlenecks, but governance may become the constraint eventually. Thus, depending on what the investment trade-offs are, I could be convinced of varying attitudes toward AI governance work. I’m personally not running towards it until I see it as clearly a bottleneck but I’d like to understand the arguments better for/against EA investing in  AI governance. 

5

New to LessWrong?