This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Alignment Hot Take Advent Calendar
LW
Login
Alignment Hot Take Advent Calendar
38
Take 1: We're not going to reverse-engineer the AI.
Ω
Charlie Steiner
7mo
Ω
4
17
Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.
Ω
Charlie Steiner
6mo
Ω
1
21
Take 3: No indescribable heavenworlds.
Ω
Charlie Steiner
6mo
Ω
12
36
Take 4: One problem with natural abstractions is there's too many of them.
Ω
Charlie Steiner
6mo
Ω
4
30
Take 5: Another problem for natural abstractions is laziness.
Ω
Charlie Steiner
6mo
Ω
4
14
Take 6: CAIS is actually Orwellian.
Ω
Charlie Steiner
6mo
Ω
8
47
Take 7: You should talk about "the human's utility function" less.
Ω
Charlie Steiner
6mo
Ω
22
28
Take 8: Queer the inner/outer alignment dichotomy.
Ω
Charlie Steiner
6mo
Ω
2
33
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Ω
Charlie Steiner
6mo
Ω
14
37
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Ω
Charlie Steiner
6mo
Ω
3
32
Take 11: "Aligning language models" should be weirder.
Ω
Charlie Steiner
6mo
Ω
0
25
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Ω
Charlie Steiner
6mo
Ω
1
53
Take 13: RLHF bad, conditioning good.
Ω
Charlie Steiner
6mo
Ω
4
15
Take 14: Corrigibility isn't that great.
Ω
Charlie Steiner
6mo
Ω
3