Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

You have one job: Solving problems. You have multiple tools. Maybe you use code as a tool to solve some problems. Maybe you use design for others. Maybe you use good communication and negotiation skills.

Mike Acton, How much time should I spend coding versus managing?

If you seek tranquility, do less. Or, more accurately, do what’s essential.

Marcus Aurelius, Meditations, Book 4.24

 

This post is part of the work done at Conjecture.

Refine, the alignment research incubator we are running at Conjecture, finished its first cohort a few weeks ago. So now is a good time to take stock, share what we’ve learned, and discuss its future.

Let’s get this out of the way first: we are not planning any new cohort in the foreseeable future. There are multiple reasons for this, which I’ll expand on in this post. But to summarize:

  • Running Refine in a way that would fully aim at the stated target would require more effort
  • SERI MATS is doing a great job of scaling conceptual alignment research, and seem open to integrate some of the ideas behind Refine
  • The work we’re doing in Conjecture’s epistemology team is far more fundamental and neglected than field-building according to me, at least in the current climate.

Now for the details.

The Target

The key idea behind Refine was to create more conceptual alignment researchers with their own radically different agendas, rather than new researchers following established approaches. To create more researchers like JohnPaulVanessaEvanSteve, and the others.

How we operationalized this goal was to look for relentlessly resourceful thinkers with unorthodox shapes of minds for the alignment community.

The Result

Now that the first cohort is over, how well have we hit this target? Out of 5 participants

  • 2 are pursuing their own research bets, though these are not radically different from established approaches
  • 1 is still building theirs
  • 1 has found a neglected field-building opportunity
  • 1 feels like they still need to upskill before working directly on alignment.

Based only on The Target above, this is 0/5. 

Of course that doesn’t mean the program didn’t have positive outcomes and externalities! On the contrary, I’m really happy how a lot of things turned out, and I’ve heard from all participants that they got a lot out of Refine. Non-negligeable accomplishments include:

  • Feedback from multiple alignment researchers that Refine participants had a deep model of the alignment problem at the end of the program.[1]
  • Refine participants all around improved their productivity, some on writing and others on iterating on ideas.
  • All Refine participants met and talked with many alignment researchers and newcomers like them, considerably expanding their network and understanding of the alignment space.
  • Participants posted around 25 posts in total on the Alignment Forum, some of which I find exciting.
  • I got a crash course in management that helped me upskill quickly.
  • We had a lot of great moments and support from each other.
  • In our leaving survey, all participants said they would highly recommend the program, and that it was more counterfactually useful than what they would have done instead by default.
  • I expect most, if not all, participants to make relevant contributions to the field.

None of these are irrelevant. Yet if we focus on the original metric, the pilot of Refine failed. Having reflected on this, I have some thoughts on how we could have better aimed at this target (whether it is the correct target is a question for a later section).

It all amounts to lack of optimization.

Failing to Optimize

The first place where we failed to optimize for wildly different research agendas was in the selection population. Given where we advertised (various EA and rationalists websites, Slacks, and Discords), we drew a crowd homogeneous along many dimensions. There was no way we were going to end up with a linguist or a sociologist for example. That would have required more targeted outreach effort.

This failure mode is shared by all training programs I know about: even PIBBSS, which successfully brought together a more diverse cohort, had trouble with the fields most different from alignment, like the social sciences.

Our second lack of optimization came from the selection process itself. If you want to create independent conceptual researchers that work on the problem right after your program, you need to push really hard for the following traits:

  • Want to work on conceptual alignment ASAP
  • Can tolerate the emotional and material difficulties of independent research
  • Is able to generate their own ideas
  • Is able to make mistakes and update

Looking back, most of the participants in the first cohort scored well along these lines, but all of them have at least one of these traits where they need to improve.

Last but not least, the process within Refine itself could have better focused on guiding participants to build a gears-level model of alignment. What we ended up doing was mostly discussing Unbounded Atomic Optimization and Epistemological Vigilance, and providing feedback on ideas. Whereas I currently see more explicit exercises (like building a treeory of change), an early focus on poking as many holes as possible in models of alignment, and a sweeping tour of the state of the art, as necessary first steps to produce worthwhile conceptual alignment research quickly.

In the end, all participants of the first cohort learned a deep model of the alignment problem, but better program structure could have accelerated this. And with such a deep gears-level model from the start, all the mentoring focused on pushing ideas towards the most relevant form for alignment would have been vastly more effective, as there would have been significantly less “translation effort” from the mentor side. 

The Right Target?

Note that the above assumes that Refine’s original goal, creating more conceptual alignment researchers with their own radically different agendas, was the right one.

But is it really? Even if it is a good one, is it the most important one, or the most crucial one to solving alignment?

I have updated toward no. Or rather, I have updated toward being suspicious of targets that look as instrumental as this one.

For creating new research directions, and new researchers, sidelines the key difficulty in solving alignment: finding what needs to be done concretely to solve alignment and the best profile for such endeavours. Instead of figuring these hard questions, you delegate them to the future, to the next generation.

On a problem with longer timelines, this might be the right move: let the compound interest do the work. Even with short timelines, if I had no ideas and no plans for addressing these hard questions, passing the buck might have been the best decision.

But I have an angle and a plan. Figuring out how to tackle alignment, why it is hard, and how to deal with these difficulties is literally the task of my epistemology team at Conjecture. In these conditions, me spending that much time on field-building seems like a bad bet: I’m doing a worst job than literally most field-builders I know (only really providing my own idiosyncratic ideas that can be shared anyway) while neglecting an angle of attack on the problem that is completely neglected and appears promising to me and Conjecture.

I’m excited to see SERI MATS and other programs step up for making new alignment researchers, and will continue to encourage them and give them feedback. But my personal arena is elsewhere.

  1. ^

    Note that some Refine participants were already working in alignment. Also, there was negative feedback too from alignment researchers, but given the base negativity of the field, positive comments are particularly strong sources of evidence.

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 7:44 PM
[-]Akash1yΩ101813

Thank you for writing this post, Adam! Looking forward to seeing what you and your epistemology team produce in the months ahead.

SERI MATS is doing a great job of scaling conceptual alignment research, and seem open to integrate some of the ideas behind Refine

I'm a big fan of SERI MATS. But my impression was that SERI MATS had a rather different pedagogy/structure (compared to Refine). In particular: 

  1. SERI MATS has an "apprenticeship model" (every mentee is matched with one mentor), whereas Refine mentees didn't have mentors.
  2. Refine was optimizing for people who could come up with "their own radically different agendas", whereas SERI MATS doesn't emphasize this. (Some mentors may encourage some of their mentees to think about new agendas, but my impression is that this varies a lot mentor-to-mentor, and it's not as baked into the overall culture). 
  3. SERI MATS has (thus far) nearly-exclusively recruited from the EA/rationality/AIS communities. Seems like Refine also did this, though I imagine that "Refine 2.0" would be more willing to recruit from outside these communities. (I'm not sure what SERI-MATS's stance is, but my impression is that their selection criteria heavily favors people who have existing work in AIS or existing connections. This of course makes sense, because past contributions/endorsements are a useful signal, unless the program is explicitly designed to go for oddball/weird/new/uncorrelated ideas). 

Two questions for you: 

  1. Are there any particular lessons/ideas from Refine that you expect (or hope) SERI MATS to incorporate?
  2. Do you think there's now a hole in the space that someone should consider filling (by making Refine 2.0), or do you expect that much of the value of Refine will be covered by SERI MATS [and other programs]?

Thanks for the kind words!

  1. Are there any particular lessons/ideas from Refine that you expect (or hope) SERI MATS to incorporate?

I have shared some of my models related to epistemology and key questions to MATS organizers, and I think they're supposed to be integrated in one of the future programs. Mostly things regarding realizing the importance of productive mistakes in science (which naturally pushes back a bit from the mentoring aspect of MATS) and understanding how less "clean" most scientific progress actually look like historically (with a basic reading list from the history of science).

From the impression I have, they are also now trying to give participants some broader perspective about the field, in addition to the specific frame of the mentor, and a bunch of the lessons from Refine about how to build a good model of the alignment problem apply.

On a more general level, I expect that I had enough discussions with them that they would naturally ask me for feedback if they thought of something that seemed Refine shaped or similar.

2. Do you think there's now a hole in the space that someone should consider filling (by making Refine 2.0), or do you expect that much of the value of Refine will be covered by SERI MATS [and other programs]?

Hum, intuitively the main value from Refine that I don't expect to be covered by future MATS would come from reaching out to very different profiles. There's a non-negligeable chance that PIBBSS manages to make that work though, so not clear that it's a problem.

Note that this is also part of why Refine feels less useful: when I conceived of it, most of these programs either didn't exist or were not well-established. Part of the frustration came from having nothing IRL for non-american to join, and just no program spending a significant amount of time on conceptual alignment, which both MATS and PIBBSS (in addition to other programs like ARENA) are now fixing. Which I think is great!

FWIW my experience of MATS 0.1 (i.e. the first run/pilot 2021-22) was that it was more open-ended and diversity-focused than subsequent MATS, which has been more apprenticeship-focused. That was helpful for me at the time, but I don't know if it was ever the intention per se, and I agree that the focus of MATS now is different. I haven't thought long enough to decide if this is good or bad.

Hey Adam, thanks for running Refine and writing this up. 

Out of curiosity, do you (or anyone else) know if there are statistics for previous SERI-MATS cohorts/other programs designed to generate conceptual alignment researchers? 

Thanks for the kind words!

I'm not aware of any such statistics, but I'm guessing that MATS organizers might have some.