This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
Books of LessWrong
LW
Login
A Moderate Update to your Artificial Priors
225
ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano
,
Mark Xu
,
Ajeya Cotra
2y
Ω
90
224
Fun with +12 OOMs of Compute
Ω
Daniel Kokotajlo
3y
Ω
86
473
What 2026 looks like
Ω
Daniel Kokotajlo
3y
Ω
150
250
Ngo and Yudkowsky on alignment difficulty
Ω
Eliezer Yudkowsky
,
Richard_Ngo
2y
Ω
148
241
Another (outer) alignment failure story
Ω
paulfchristiano
3y
Ω
38
271
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Ω
Andrew_Critch
3y
Ω
64
254
The Plan
Ω
johnswentworth
2y
Ω
78
146
Finite Factored Sets
Ω
Scott Garrabrant
3y
Ω
95
123
Selection Theorems: A Program For Understanding Agents
Ω
johnswentworth
3y
Ω
28
159
My research methodology
Ω
paulfchristiano
3y
Ω
38
254
larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
2y
Ω
31
138
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Ω
So8res
2y
Ω
14
292
EfficientZero: How It Works
Ω
1a3orn
2y
Ω
50
159
Specializing in Problems We Don't Understand
johnswentworth
3y
29