Introducing the AI Alignment Forum (FAQ)

habryka; Ben Pace; Raemon; jimrandomh

After a few months of open beta, the AI Alignment Forum is ready to launch. It is a new website built by the team behind LessWrong 2.0, to help create a new hub for technical AI Alignment research and discussion. This is an in-progress FAQ about the new Forum.

What are the five most important highlights about the AI Alignment Forum in this FAQ?

The vision for the forum is of a single online hub for alignment researchers to have conversations about all ideas in the field...
...while also providing a better onboarding experience for people getting involved with alignment research than exists currently.
There are three new sequences focusing on some of the major approaches to alignment, which will update daily for the coming 6-8 weeks.

Embedded Agency, written by Scott Garrabrant and Abram Demski of MIRI
Iterated Amplification, written and compiled by Paul Christiano of OpenAI
Value Learning, written and compiled by Rohin Shah of CHAI

For non-members and future researchers, the place to interact with the content is LessWrong.com, where all Forum content will be crossposted.
The site will continue to be improved in the long-term, as the team comes to better understands the needs and goals of researchers.

What is the purpose of the AI Alignment Forum?

Our first priority is obviously to avert catastrophic outcomes from unaligned Artificial Intelligence. We think the best way to achieve this at the margin is to build an online-hub for AI Alignment research, which both allows the existing top researchers in the field to talk about cutting-edge ideas and approaches, as well as the onboarding of new researchers and contributors.

We think that to solve the AI Alignment problem, the field of AI Alignment research needs to be able to effectively coordinate a large number of researchers from a large number of organisations, with significantly different approaches. Two decades ago we might have invested heavily in the development of a conference or a journal, but with the onset of the internet, an online forum with its ability to do much faster and more comprehensive forms of peer-review seemed to us like a more promising way to help the field form a good set of standards and methodologies.

Who is the AI Alignment Forum for?

There exists an interconnected community of Alignment researchers in industry, academia, and elsewhere, who have spent many years thinking carefully about a variety of approaches to alignment. Such research receives institutional support from organisations including FHI, CHAI, DeepMind, OpenAI, MIRI, Open Philanthropy, and others. The Forum membership currently consists of researchers at these organisations and their respective collaborators.

The Forum is also intended to be a way to interact with and contribute to the cutting edge research for people not connected to these institutions either professionally or socially. There have been many such individuals on LessWrong, and that is the current best place for such people to start contributing, to be given feedback and skill-up in this domain.

There are about 50-100 members of the Forum. These folks will be able to post and comment on the Forum, and this group will not grow in size quickly.

Why do we need another website for alignment research?

There are many places online that host research on the alignment problem, such as the OpenAI blog, the DeepMind Safety Research blog, the Intelligent Agent Foundations Forum, AI-Alignment.com, and of course LessWrong.com.

But none of these spaces are set up to host discussion amongst the 50-100 people working in the field. And those that do host discussion have unclear assumptions about what’s common knowledge.

What type of content is appropriate for this Forum?

As a rule-of-thumb, if a thought is something you’d bring up when talking to someone at a research workshop or a colleague in your lab, it’s also a welcome comment or post here.

If you’d like a sense of what other Forum members are interested in, here’s some quick data on what high-level content forum members are interested in seeing, taken from a survey we gave to invitees to the open beta (n = 34).

The responses were on a 1-5 scale, which represented “If I see 1 post per day, I want to see this type of content…” (1) Once per year, (2) Once per 3-4 months (3) Once per 1-2 months (4) Once per 1-2 weeks (5) A third of all posts that I see.

Here were the types of content asked about, and the mean response:

New theory-oriented alignment research typical of MIRI or CHAI: 4.4 / 5
New ML-oriented alignment research typical of OpenAI or DeepMind's safety teams: 4.2 / 5
New formal or nearly-formal discussion of intellectually interesting topics that look questionably/ambiguously/peripherally alignment-related: 3.5 / 5
High-quality informal discussion of alignment research methodology and background assumptions, what's needed for progress on different agendas, why people are pursuing this or that agenda, etc: 4.1 / 5
Attempts to more clearly package/explain/summarise previously discussed alignment research: 3.7 / 5
New technical ideas that are clearly not alignment-related but are likely to be intellectually interesting to forum regulars: 2.2 / 5
High-quality informal discussion of very core background questions about advanced AI systems: 3.3 / 5
Typical AGI forecasting research/discussion that isn't obviously unusually relevant to AGI alignment work: 2.2 / 5

Related data: After integrating over all 34 respondents’ self-predictions, they predict 3.2 comments and 0.99 posts per day. We’ll report on everyone’s self-accuracy in a year ;)

What are the three new sequences I've been hearing about?

We have been coordinating with AI alignment researchers to create three new sequences of posts that we hope can serve as introductions to some of the most important core ideas in AI Alignment. The three new sequences will be:

Embedded Agency, written by Scott Garrabrant and Abram Demski of MIRI
Iterated Amplification, written and compiled by Paul Christiano of OpenAI
Value Learning, written and compiled by Rohin Shah of CHAI

Over the next few weeks, we will be releasing about one post per day from these sequences, starting with the first post in the Embedded Agency sequence.

If you are interested in learning about AI alignment, you're very welcome to ask questions and discuss the content in the comment sections. And if you are already familiar with a lot of the core ideas, then we would greatly appreciate feedback on the sequences as we publish them. We hope that these sequences can be a major part of how new people get involved in AI alignment research, and so we care a lot about their quality and clarity.

In what way is it easier for potential future Alignment researchers to get involved?

Most scientific fields have to balance the need for high-context discussion with other specialists, and public discussion which allows the broader dissemination of new ideas, the onboarding of new members and the opportunity for new potential researchers to prove themselves. We tried to design a system that still allows newcomers to participate and learn, while giving established researchers the space to have high-level discussions with other researchers.

To do that, we integrated the new AI Alignment Forum closely with the existing LessWrong platform, where you can find and comment on all content on the AI Alignment Forum on LessWrong, and your comments and posts can be moved to the AI Alignment Forum by mods for further engagement by the researchers. For details on the exact setup, see the question on that below.

We hope that this will result in a system in which cutting-edge research and discussion can happen, while new good ideas and participants can get noticed and rewarded for their contributions.

If you’ve been interested in doing alignment research, then we think one of the best ways to do that right now is to comment on AI Alignment Forum posts on LessWrong, and check out the new content we’ll be rolling out.

What is the exact setup with content on LessWrong?

Here are the details:

Automatic Crossposting - Any new post or comment on the new AI Alignment Forum is automatically cross-posted to LessWrong.com. Accounts are also shared between the two platforms.
Content Promotion - Any comment or post on LessWrong can be promoted by members of the AI Alignment Forum from LessWrong to the AI Alignment Forum.
Separate Reputation – The reputation systems for LessWrong and the AI Alignment Forum are separate. On LessWrong you can see two reputation scores: a primary karma score combining karma from both sites, and a secondary karma score specific to AI Alignment Forum members. On the AI Alignment Forum, you will just see their AI Alignment karma.
Content Ownership - If a comment or post of yours is promoted to the AI Alignment Forum, you will continue to have full ownership of the content, and you’ll be able to respond directly to all comments by members on your content.

The AI Alignment Forum survey (sent to all beta invitees) received 34 submissions. One question asked whether the integration with LW would lead to the person contributing more or less to the AI Alignment Forum (on a range from 0 to 6). The mean response was 3.7, the median was 3, and there was only one response below 3 (where 3 represented ‘doesn’t matter’).

How do new members get added to the Forum?

There are about 50-100 members of the AI Alignment Forum, and while the number will grow, it will grow rarely and slowly.

We’re talking with the alignment researchers at CHAI, DeepMind, OpenAI, MIRI, and will be bringing on a moderator with invite-power from each of those organisations. They will naturally have a much better sense of the field and researchers in their orgs, than we the site designers. We’ll edit this post to include them once they’re confirmed.

On alignmentforum.org in the top right corner (after you created an account) is a small application form available. If you’re a regular contributor on LessWrong and want to point us to some of your best work, or if perhaps you’re a full-time researcher in an adjacent field and would like to participate in the Forum research discussion, you’re welcome to use that to let us know who you are and what research you have done.

Who is running this project?

The AI Alignment Forum development team consists of Oliver Habryka, Ben Pace, Raymond Arnold, and Jim Babcock. We're in conversation with alignment researchers from DeepMind, OpenAI, MIRI and CHAI to confirm moderators from those organisations.

We would like to thank BERI, EA Grants, Nick Beckstead, Matt Wage and Eric Rogstad for the support that lead to this Forum being built.

Can I use LaTeX?

Yes! You can use LaTeX in posts and comments with Cmd+4 / Ctrl+4.

Also, if you go into your user settings and switch to the markdown editor, you can just copy-paste LaTeX into a post/comment and it will render when you submit with no further work.

(Talk to us in intercom if you run into any problems.)

I have a different question.

Use the comment section below. Alternatively, use intercom (bottom right corner).

90

Introducing the AI Alignment Forum (FAQ)

90

Ω 28

What are the five most important highlights about the AI Alignment Forum in this FAQ?

What is the purpose of the AI Alignment Forum?

Who is the AI Alignment Forum for?

Why do we need another website for alignment research?

What type of content is appropriate for this Forum?

What are the three new sequences I've been hearing about?

In what way is it easier for potential future Alignment researchers to get involved?

What is the exact setup with content on LessWrong?

How do new members get added to the Forum?

Who is running this project?

Can I use LaTeX?

I have a different question.

90

Ω 28

90

Ω 28