This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Books of LessWrong
LW
Login
A Moderate Update to your Artificial Priors
225
ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano
,
Mark Xu
,
Ajeya Cotra
3y
Ω
90
224
Fun with +12 OOMs of Compute
Ω
Daniel Kokotajlo
4y
Ω
86
519
What 2026 looks like
Ω
Daniel Kokotajlo
3y
Ω
155
253
Ngo and Yudkowsky on alignment difficulty
Ω
Eliezer Yudkowsky
,
Richard_Ngo
3y
Ω
151
244
Another (outer) alignment failure story
Ω
paulfchristiano
4y
Ω
38
278
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Ω
Andrew_Critch
4y
Ω
65
254
The Plan
Ω
johnswentworth
3y
Ω
78
148
Finite Factored Sets
Ω
Scott Garrabrant
3y
Ω
95
123
Selection Theorems: A Program For Understanding Agents
Ω
johnswentworth
3y
Ω
28
159
My research methodology
Ω
paulfchristiano
4y
Ω
38
260
larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
3y
Ω
31
138
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Ω
So8res
3y
Ω
15
297
EfficientZero: How It Works
Ω
1a3orn
3y
Ω
50
173
Specializing in Problems We Don't Understand
johnswentworth
4y
29