LESSWRONG
LW

1886
How might we solve the alignment problem?

How might we solve the alignment problem?

Oct 30, 2024 by Joe Carlsmith

This is a four-part series of posts about how we might solve the alignment problem. It also builds off of my previous post, here, about what it would even be to solve the alignment problem; and to some extent, off of this post outlining a framework for thinking the incentives at stake in AI power-seeking.

54How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith
11mo
5
45Motivation control
Joe Carlsmith
11mo
7
28Option control
Joe Carlsmith
11mo
0
31Incentive design and capability elicitation
Joe Carlsmith
10mo
0