This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
74
Best of Alignment Forum IV
Best of Alignment Forum IV
228
ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano
,
Mark Xu
,
Ajeya Cotra
4y
Ω
90
229
Fun with +12 OOMs of Compute
Ω
Daniel Kokotajlo
5y
Ω
86
261
Ngo and Yudkowsky on alignment difficulty
Ω
Eliezer Yudkowsky
,
Richard_Ngo
4y
Ω
152
250
Another (outer) alignment failure story
Ω
paulfchristiano
5y
Ω
39
285
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Ω
Andrew_Critch
5y
Ω
65
261
The Plan
Ω
johnswentworth
4y
Ω
78
149
Finite Factored Sets
Ω
Scott Garrabrant
4y
Ω
97
133
Selection Theorems: A Program For Understanding Agents
Ω
johnswentworth
4y
Ω
28
161
My research methodology
Ω
paulfchristiano
5y
Ω
38
261
larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
4y
Ω
31
139
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Ω
So8res
4y
Ω
15
299
EfficientZero: How It Works
Ω
1a3orn
4y
Ω
50
167
Worst-case thinking in AI alignment
Ω
Buck
4y
Ω
18