Thanks for your comment, Koen. Two quick clarifications:
In the event that we receive a high number of submissions, the undergrads and grad students will screen submissions. Submissions above a certain cutoff will be sent to our (senior) panel of judges.
People who submit promising 500-word submissions will (often) be asked to submit longer responses. The 500-word abstract is meant to save people time (they get feedback on the 500-word idea before they spend a bunch of time formalizing things, running experiments, etc.)
Two questions for you:
What do you think are the strongest proposals for corrigibility? Would love to see links to the papers/proofs.
Can you email us at firstname.lastname@example.org and email@example.com with some more information about you, your AIS background, and what kinds of submissions you’d be interested in judging? We’ll review this with our advisors and get back to you (and I appreciate you volunteering to judge!)
Thank you for writing this post, Adam! Looking forward to seeing what you and your epistemology team produce in the months ahead.
SERI MATS is doing a great job of scaling conceptual alignment research, and seem open to integrate some of the ideas behind Refine
I'm a big fan of SERI MATS. But my impression was that SERI MATS had a rather different pedagogy/structure (compared to Refine). In particular:
Two questions for you:
Hastily written; may edit later
Thanks for mentioning this, Jan! We'd be happy to hear suggestions for additional judges. Feel free to email us at firstname.lastname@example.org and email@example.com.
Some additional thoughts:
Some clarifications + quick thoughts on Sam’s points:
Thanks, Lawrence! Responses below:
Thanks for catching this, Lawrence! You're right-- we accidentally had an old version of the official rules. Just updated it with the new version. In general, I'd trust the text on the website (but definitely feel free to let us know if you notice any other inconsistencies).
As for your specific questions:
If you notice anything else, feel free to email at firstname.lastname@example.org and email@example.com (probably better to communicate there than via LW comments).
Thanks, Zac! Your description adds some useful details and citations.
I'd like the one-sentence descriptions to be pretty jargon-free, and I currently don't see how the original one misses the point.
I've made some minor edits to acknowledge that the purpose of building larger models is broader than "tackle problems", and I've also linked people to your comment where they can find some useful citations & details. (Edits in bold)
Anthropic: Let’s interpret large language models by understanding which parts of neural networks are involved with certain types of cognition, and let’s build larger language models to tackle problems, test methods, and understand phenomenon that will emerge as we get closer to AGI (see Zac's comment for some details & citations).
I also think there's some legitimate debate around Anthropic's role in advancing capabilities/race dynamics (e.g., I encourage people to see Thomas's comments here). So while I understand that Anthropic's goal is to do this work without advancing capabilities/race dynamics, I currently classify this as "a debated claim about Anthropic's work" rather than "an objective fact about Anthropic's work."
I'm interested in the policy work & would love to learn more about it. Could you share some more specifics about Anthropic's most promising policy work?
Thanks for writing this, Jacob!
I wonder if you (or other members of Lightcone) have any advice on how to hire/get contractors.
It seems to me like many orgs (even ones with good internal practices) fail because they hired too quickly or too slowly or not-the-right-people or not-the-right-mix-of-people. Any thoughts?
(And no worries if you’re like “cool question but i don’t have any quick answers”)
+1 on many of the projects in my list requiring a team with really good ops and (ideally) people with experience beyond college/academia.
(I hope I didn’t give off the impression that anyone can or should start those.)
Great question! I agree that the mentorship bottleneck is important & any new programs will have to have a plan for how to deal with it. I have four specific thoughts:
First, I'd be excited to see more programs that experiment with different styles of mentorship/training. MATS has an apprenticeship model (each mentee gets assigned to a mentor). There are lots of other models out there (e.g., Refine), and some of these involve a lower need for mentors.
Second, I would be surprised if any single program was able to fully tap into the pool of possible mentors. My impression is that there are a lot of "medium-experience" alignment researchers who could already take mentees. (In graduate school, it's common for experienced graduate students to mentor undergraduates, for instance).
Third, different programs can tap into different target audiences. Ex: PIBBSS and Philosophy Fellowship.
Fourth, I (tentatively/not with a ton of confidence) think that it's desirable to have more competition. I'm generally worried about people thinking "oh X program exists, so I shouldn't do anything in that space." It's trickier with mentorship programs (because of the mentor bottleneck), but I wouldn't be too surprised if some mentors were open to experimenting with new programs. I also wouldn't be surprised if, say, 5 years from now, there was an entirely different program that was the dominant player in the alignment mentorship space. (Or if MATS was still the dominant player but it looked extremely different).
(Thanks for your comment by the way. I think this nudged me to be more specific with what I meant, and point #4 is the only one that's explicitly about direct competition).