This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Alignment Hot Take Advent Calendar
LW
Login
Alignment Hot Take Advent Calendar
38
Take 1: We're not going to reverse-engineer the AI.
Ω
Charlie Steiner
10mo
Ω
4
17
Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.
Ω
Charlie Steiner
10mo
Ω
1
21
Take 3: No indescribable heavenworlds.
Ω
Charlie Steiner
10mo
Ω
12
36
Take 4: One problem with natural abstractions is there's too many of them.
Ω
Charlie Steiner
10mo
Ω
4
30
Take 5: Another problem for natural abstractions is laziness.
Ω
Charlie Steiner
10mo
Ω
4
14
Take 6: CAIS is actually Orwellian.
Ω
Charlie Steiner
10mo
Ω
8
47
Take 7: You should talk about "the human's utility function" less.
Ω
Charlie Steiner
10mo
Ω
22
28
Take 8: Queer the inner/outer alignment dichotomy.
Ω
Charlie Steiner
10mo
Ω
2
33
Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Ω
Charlie Steiner
10mo
Ω
14
37
Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Ω
Charlie Steiner
10mo
Ω
3
32
Take 11: "Aligning language models" should be weirder.
Ω
Charlie Steiner
9mo
Ω
0
25
Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Ω
Charlie Steiner
9mo
Ω
1
53
Take 13: RLHF bad, conditioning good.
Ω
Charlie Steiner
9mo
Ω
4
15
Take 14: Corrigibility isn't that great.
Ω
Charlie Steiner
9mo
Ω
3