LESSWRONG
LW

513
Alignment Stream of Thought

Alignment Stream of Thought

Mar 27, 2022 by leogao

Epistemic status: statements dreamed up by the utterly Deranged

This sequence contains posts that are lower effort than my usual posts, where instead of thinking things all the way through before posting something polished about them, I instead post things that are rough and in-progress as I think about them. I'm trying this because I noticed that I had lots of interesting thoughts that I didn't want to share due to not having totally figured it out yet, and that the process of writing things and posting them often helps me make progress. 

Anything in this sequence is at even greater risk than usual of being obsoleted or unendorsed later down the road. It will also be more difficult to follow than usual, because I'm not putting as much effort as usual into explaining background.

I am hoping to eventually do a distillation of the important insights of this sequence into more legible post(s) once I'm less confused.

34[ASoT] Observations about ELK
Ω
leogao
3y
Ω
0
26[ASoT] Some ways ELK could still be solvable in practice
Ω
leogao
3y
Ω
1
26[ASoT] Searching for consequentialist structure
Ω
leogao
3y
Ω
2
24[ASoT] Some thoughts about deceptive mesaoptimization
Ω
leogao
3y
Ω
5
10[ASoT] Some thoughts about LM monologue limitations and ELK
Ω
leogao
3y
Ω
0
7[ASoT] Some thoughts about imperfect world modeling
Ω
leogao
3y
Ω
0
38[ASoT] Consequentialist models as a superset of mesaoptimizers
Ω
leogao
3y
Ω
2
27Humans Reflecting on HRH
Ω
leogao
3y
Ω
4
81Towards deconfusing wireheading and reward maximization
Ω
leogao
3y
Ω
7
42[ASoT] Some thoughts on human abstractions
Ω
leogao
3y
Ω
4