Wiki Contributions


Thanks for your comment, Koen. Two quick clarifications:

  1. In the event that we receive a high number of submissions, the undergrads and grad students will screen submissions. Submissions above a certain cutoff will be sent to our (senior) panel of judges.

  2. People who submit promising 500-word submissions will (often) be asked to submit longer responses. The 500-word abstract is meant to save people time (they get feedback on the 500-word idea before they spend a bunch of time formalizing things, running experiments, etc.)

Two questions for you:

  1. What do you think are the strongest proposals for corrigibility? Would love to see links to the papers/proofs.

  2. Can you email us at akash@alignmentawards.com and olivia@alignmentawards.com with some more information about you, your AIS background, and what kinds of submissions you’d be interested in judging? We’ll review this with our advisors and get back to you (and I appreciate you volunteering to judge!)


Thank you for writing this post, Adam! Looking forward to seeing what you and your epistemology team produce in the months ahead.

SERI MATS is doing a great job of scaling conceptual alignment research, and seem open to integrate some of the ideas behind Refine

I'm a big fan of SERI MATS. But my impression was that SERI MATS had a rather different pedagogy/structure (compared to Refine). In particular: 

  1. SERI MATS has an "apprenticeship model" (every mentee is matched with one mentor), whereas Refine mentees didn't have mentors.
  2. Refine was optimizing for people who could come up with "their own radically different agendas", whereas SERI MATS doesn't emphasize this. (Some mentors may encourage some of their mentees to think about new agendas, but my impression is that this varies a lot mentor-to-mentor, and it's not as baked into the overall culture). 
  3. SERI MATS has (thus far) nearly-exclusively recruited from the EA/rationality/AIS communities. Seems like Refine also did this, though I imagine that "Refine 2.0" would be more willing to recruit from outside these communities. (I'm not sure what SERI-MATS's stance is, but my impression is that their selection criteria heavily favors people who have existing work in AIS or existing connections. This of course makes sense, because past contributions/endorsements are a useful signal, unless the program is explicitly designed to go for oddball/weird/new/uncorrelated ideas). 

Two questions for you: 

  1. Are there any particular lessons/ideas from Refine that you expect (or hope) SERI MATS to incorporate?
  2. Do you think there's now a hole in the space that someone should consider filling (by making Refine 2.0), or do you expect that much of the value of Refine will be covered by SERI MATS [and other programs]?

Hastily written; may edit later

Thanks for mentioning this, Jan! We'd be happy to hear suggestions for additional judges. Feel free to email us at akash@alignmentawards.com and olivia@alignmentawards.com.  

Some additional thoughts:

  1. We chose judges primarily based on their expertise and (our perception of) their ability to evaluate submissions about goal misgeneralization and corrigibility. Lauro, Richard, Nate, and John ade some of few researchers who have thought substantially about these problems. In particular, Lauro first-authored the first paper about goal misgeneralization and Nate first-authored a foundational paper about corrigibility.
  2. We think the judges do have some reasonable credentials (e.g., Richard works at OpenAI, Lauro is a PhD student at the University of Cambridge, Nate Soares is the Executive Director of a research organization & he has an h-index of 12, as well as 500+ citations). I think the contest meets the bar of "having reasonably well-credentialed judges" but doesn't meet the bar of "having extremely well-credentialed judges (e.g., well-established professors with thousands of citations). I think that's fine.
  3. We got feedback from several ML people before launching. We didn't get feedback that this looks "extremely weird" (though I'll note that research competitions in general are pretty unusual). 
  4. I think it's plausible that some people will find this extremely weird (especially people who judge things primarily based on the cumulative prestige of the associated parties & don't think that OpenAI/500 citations/Cambridge are enough), but I don't expect this to be a common reaction.

Some clarifications + quick thoughts on Sam’s points:

  1. The contest isn’t aimed primarily/exclusively at established ML researchers (though we are excited to receive submissions from any ML researchers who wish to participate). 
  2. We didn’t optimize our contest to attract established researchers. Our contests are optimized to take questions that we think are at the core of alignment research and present them in a (somewhat less vague) format that gets more people to think about them.
  3. We’re excited that other groups are running contests that are designed to attract established researchers & present different research questions. 
  4. All else equal, we think that precise/quantifiable grading criteria & a diverse panel of reviewers are preferable. However, in our view, many of the core problems in alignment (including goal misgeneralization and corrigibility) have not been sufficiently well-operationalized to have precise/quantifiable grading criteria at this stage.

Thanks, Lawrence! Responses below:

  1. You can submit work that you're also planning on submitting to ML conferences/journals. Feel free to email us if you have any specific constraints (e.g., as noted on the site, we plan to make proposals & winners public by default).
  2. People who are already working on alignment are able (and encouraged!) to submit. 

Thanks for catching this, Lawrence! You're right-- we accidentally had an old version of the official rules. Just updated it with the new version. In general, I'd trust the text on the website (but definitely feel free to let us know if you notice any other inconsistencies).

As for your specific questions:

  1. 500 words
  2. If we get a lot of submissions, submissions will initially be screened by junior alignment researchers (undergrads & grad students) and then passed onto our senior judges (like the ones listed).
  3. Deadline is March 1. 
  4. No limit on the number of submissions per individual/team.

If you notice anything else, feel free to email at akash@alignmentawards.com and olivia@alignmentawards.com (probably better to communicate there than via LW comments). 

Thanks, Zac! Your description adds some useful details and citations.

I'd like the one-sentence descriptions to be pretty jargon-free, and I currently don't see how the original one misses the point. 

I've made some minor edits to acknowledge that the purpose of building larger models is broader than "tackle problems", and I've also linked people to your comment where they can find some useful citations & details. (Edits in bold)

Anthropic: Let’s interpret large language models by understanding which parts of neural networks are involved with certain types of cognition, and let’s build larger language models to tackle problems, test methods, and understand phenomenon that will emerge as we get closer to AGI (see Zac's comment for some details & citations).

I also think there's some legitimate debate around Anthropic's role in advancing capabilities/race dynamics (e.g., I encourage people to see Thomas's comments here). So while I understand that Anthropic's goal is to do this work without advancing capabilities/race dynamics, I currently classify this as "a debated claim about Anthropic's work" rather than "an objective fact about Anthropic's work."

I'm interested in the policy work & would love to learn more about it. Could you share some more specifics about Anthropic's most promising policy work?

Thanks for writing this, Jacob!

I wonder if you (or other members of Lightcone) have any advice on how to hire/get contractors.

It seems to me like many orgs (even ones with good internal practices) fail because they hired too quickly or too slowly or not-the-right-people or not-the-right-mix-of-people. Any thoughts?

(And no worries if you’re like “cool question but i don’t have any quick answers”)

+1 on many of the projects in my list requiring a team with really good ops and (ideally) people with experience beyond college/academia.

(I hope I didn’t give off the impression that anyone can or should start those.)

Great question! I agree that the mentorship bottleneck is important & any new programs will have to have a plan for how to deal with it. I have four specific thoughts:

First, I'd be excited to see more programs that experiment with different styles of mentorship/training. MATS has an apprenticeship model (each mentee gets assigned to a mentor). There are lots of other models out there (e.g., Refine), and some of these involve a lower need for mentors.

Second, I would be surprised if any single program was able to fully tap into the pool of possible mentors. My impression is that there are a lot of "medium-experience" alignment researchers who could already take mentees. (In graduate school, it's common for experienced graduate students to mentor undergraduates, for instance).

Third, different programs can tap into different target audiences. Ex: PIBBSS and Philosophy Fellowship.

Fourth, I (tentatively/not with a ton of confidence) think that it's desirable to have more competition. I'm generally worried about people thinking "oh X program exists, so I shouldn't do anything in that space." It's trickier with mentorship programs (because of the mentor bottleneck), but I wouldn't be too surprised if some mentors were open to experimenting with new programs. I also wouldn't be surprised if, say, 5 years from now, there was an entirely different program that was the dominant player in the alignment mentorship space. (Or if MATS was still the dominant player but it looked extremely different). 

(Thanks for your comment by the way. I think this nudged me to be more specific with what I meant, and point #4 is the only one that's explicitly about direct competition). 

Load More