JamesH

The Core of the Alignment Problem is...

Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction When trying to tackle a hard problem, a generally effective opening tactic is to Hold Off On Proposing Solutions: to fully discuss a problem and the different facets and aspects of it. This is intended to prevent you from anchoring to a particular pet solution and (if you're lucky) to gather enough evidence that you can see what a Real Solution would look like. We wanted to directly tackle the hardest part of the alignment problem, and make progress towards a Real Solution, so when we had to choose a project for SERI MATS, we began by arguing in a Google doc about what the core problem is. This post is a cleaned-up version of that doc. The Technical Alignment Problem The overall problem of alignment is the problem of, for an Artificial General Intelligence with potentially superhuman capabilities, making sure that the AGI does not use these capabilities to do things that humanity would not want. There are many reasons that this may happen such as instrumental convergent goals or orthogonality. Layout In each section below we make a different case for what the "core of the alignment problem" is. It's possible we misused some terminology when naming each section. The document is laid out as follows: We have two supra-framings on alignment: Outer Alignment and Inner Alignment. Each of these is then broken down further into subproblems. Some of these specific problems are quite broad, and cut through both Outer and Inner alignment, we've tried to put problems in the sections we think fits best (and when neither fits best, collected them in an Other category) though reasonable people may disagree with our classifications. In each section, we've laid out some cruxes, which are statements that support that frame on the core of the alignment problem. These cruxes are not necessary or sufficient conditions for a problem to be central. Frames on outer alignment

76Aug 17, 2022

JamesH

Message

397

ARENA 8.0 - Call for Applicants

TL;DR: We're excited to announce the eighth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! Our mission is to provide talented individuals with the ML engineering skills, community, and confidence to contribute directly to technical AI safety. ARENA 8.0 will...

Feb 2013

ARENA 6.0 Impact Report

The impact report from ARENA’s prior iteration, ARENA 5.0, is available here. Summary: ARENA 6.0 took place at the London Initiative for Safe AI (LISA) between September 1st and October 3rd, 2025. The purpose of this report is to evaluate ARENA 6.0’s impact according to ARENA’s four success criteria: 1....

Nov 26, 202511

ARENA 7.0 - Call for Applicants

TL;DR: We're excited to announce the seventh iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! Our mission is to provide talented individuals with the ML engineering skills, community, and confidence to contribute directly to technical AI safety. ARENA 7.0 will...

Sep 30, 202527

ARENA 5.0 Impact Report

The impact report from ARENA’s prior iteration, ARENA 4.0, is available here. Summary: The purpose of this report is to evaluate ARENA 5.0’s impact according to ARENA’s four success criteria: 1. Source high-quality participants; 2. Upskill these talented participants in ML skills for AI safety work; 3. Integrate participants with...

Aug 11, 202525

ARENA 6.0 - Call for Applicants

TL;DR: We're excited to announce the sixth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! Our mission is to provide talented individuals with the ML engineering skills, community, and confidence to contribute directly to technical AI safety. ARENA will be...

Jun 4, 202526

ARENA 5.0 - Call for Applicants

TL;DR We're excited to announce the fifth iteration of ARENA (Alignment Research Engineer Accelerator), a 4-5 week ML bootcamp with a focus on AI safety! Our mission is to provide talented individuals with the ML engineering skills, community, and confidence to contribute directly to technical AI safety. ARENA will be...

Jan 30, 202535

ARENA 4.0 Impact Report

If you're interested in helping to run the ARENA program, note that we're currently hiring for an Operations Lead! For more details, and to apply, see here. Summary The purpose of this report is to evaluate ARENA 4.0’s impact according to our four success criteria: 1. Source high-quality participants 2....

Nov 27, 202445

Load More (7/14)

LESSWRONG
LW

LESSWRONG
LW

JamesH

JamesH

JamesH

The Core of the Alignment Problem is...

Finding Goals in the World Model

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

ARENA 4.0 Impact Report

JamesH

ARENA 8.0 - Call for Applicants

ARENA 6.0 Impact Report

ARENA 7.0 - Call for Applicants

ARENA 5.0 Impact Report

ARENA 6.0 - Call for Applicants

ARENA 5.0 - Call for Applicants

ARENA 4.0 Impact Report

ARENA 8.0 - Call for Applicants

ARENA 6.0 Impact Report

ARENA 7.0 - Call for Applicants

ARENA 5.0 Impact Report

ARENA 6.0 - Call for Applicants

ARENA 5.0 - Call for Applicants

ARENA 4.0 Impact Report

The Core of the Alignment Problem is...

Finding Goals in the World Model

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

ARENA 4.0 Impact Report