LESSWRONG
LW

1232
Alignment Hot Take Advent Calendar

Alignment Hot Take Advent Calendar

Dec 01, 2022 by Charlie Steiner
38Take 1: We're not going to reverse-engineer the AI.
Ω
Charlie Steiner
3y
Ω
4
17Take 2: Building tools to help build FAI is a legitimate strategy, but it's dual-use.
Ω
Charlie Steiner
3y
Ω
1
23Take 3: No indescribable heavenworlds.
Ω
Charlie Steiner
3y
Ω
12
37Take 4: One problem with natural abstractions is there's too many of them.
Ω
Charlie Steiner
3y
Ω
4
31Take 5: Another problem for natural abstractions is laziness.
Ω
Charlie Steiner
3y
Ω
4
12Take 6: CAIS is actually Orwellian.
Ω
Charlie Steiner
3y
Ω
8
50Take 7: You should talk about "the human's utility function" less.
Ω
Charlie Steiner
3y
Ω
22
31Take 8: Queer the inner/outer alignment dichotomy.
Ω
Charlie Steiner
3y
Ω
2
33Take 9: No, RLHF/IDA/debate doesn't solve outer alignment.
Ω
Charlie Steiner
3y
Ω
13
37Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.
Ω
Charlie Steiner
3y
Ω
3
34Take 11: "Aligning language models" should be weirder.
Ω
Charlie Steiner
3y
Ω
0
25Take 12: RLHF's use is evidence that orgs will jam RL at real-world problems.
Ω
Charlie Steiner
3y
Ω
1
54Take 13: RLHF bad, conditioning good.
Ω
Charlie Steiner
3y
Ω
4
15Take 14: Corrigibility isn't that great.
Ω
Charlie Steiner
3y
Ω
3