Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

(Cross-post from ai-alignment.com)

I’m now working full-time on the Alignment Research Center (ARC), a new non-profit focused on intent alignment research.

I left OpenAI at the end of January and I’ve spent the last few months planning, doing some theoretical research, doing some logistical set-up, and taking time off.

For now it’s just me, focusing on theoretical research. I’m currently feeling pretty optimistic about this work: I think there’s a good chance that it will yield big alignment improvements within the next few years, and a good chance that those improvements will be integrated into practice at leading ML labs.

My current goal is to build a small team working productively on theory. I’m not yet sure how we’ll approach hiring, but if you’re potentially interested in joining you can fill out this tiny form to get notified when we’re ready.

Over the medium term (and maybe starting quite soon) I also expect to implement and study techniques that emerge from theoretical work, to help ML labs adopt alignment techniques, and to work on alignment forecasting and strategy.

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 6:44 PM
[-]Ben Pace3yΩ13310

Best of skill to you.

This is so great! I always hate wishing people luck when I trust in their competence to mostly deal with bad luck and leverage good luck. I'll use that one now.

[-]adamShimi3yΩ5100

Sounds really exciting! I'm wondering which kind of theoretical computer science do you have in mind specifically? Like which part of that do you think has the most uses for alignment? (Still trying to find a way to use my PhD in the theory of distributed computing for something alignment related ^^)

What do you see as central examples of work on "alignment forecasting"? I'm unclear on what that means.

Congrats!

I'll procrastinate from thesis-writing to fill out the form :)

[-]Ben Pace3yΩ9170

You're gonna get back to thesis writing quickly, it's a very short form.