This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
1232
Alignment Hot Take Advent Calendar
Alignment Hot Take Advent Calendar
38
Take 1: We're not going to reverse-engineer the AI.
Ω
Charlie Steiner
3y
Ω
4
17
Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.
Ω
Charlie Steiner
3y
Ω
1
23
Take 3: No indescribable heavenworlds.
Ω
Charlie Steiner
3y
Ω
12
37
Take 4: One problem with natural abstractions is there's too many of them.
Ω
Charlie Steiner
3y
Ω
4
31
Take 5: Another problem for natural abstractions is laziness.
Ω
Charlie Steiner
3y
Ω
4
12
Take 6: CAIS is actually Orwellian.
Ω
Charlie Steiner
3y
Ω
8
50
Take 7: You should talk about "the human's utility function" less.
Ω
Charlie Steiner
3y
Ω
22
31
Take 8: Queer the inner/outer alignment dichotomy.
Ω
Charlie Steiner
3y
Ω
2
33
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Ω
Charlie Steiner
3y
Ω
13
37
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Ω
Charlie Steiner
3y
Ω
3
34
Take 11: "Aligning language models" should be weirder.
Ω
Charlie Steiner
3y
Ω
0
25
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Ω
Charlie Steiner
3y
Ω
1
54
Take 13: RLHF bad, conditioning good.
Ω
Charlie Steiner
3y
Ω
4
15
Take 14: Corrigibility isn't that great.
Ω
Charlie Steiner
3y
Ω
3