This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
LESSWRONG
LW
Login
2681
Books of LessWrong
A Moderate Update to your Artificial Priors
228
ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano
,
Mark Xu
,
Ajeya Cotra
4y
Ω
90
229
Fun with +12 OOMs of Compute
Ω
Daniel Kokotajlo
5y
Ω
86
589
What 2026 looks like
Ω
Daniel Kokotajlo
4y
Ω
166
261
Ngo and Yudkowsky on alignment difficulty
Ω
Eliezer Yudkowsky
,
Richard_Ngo
4y
Ω
152
250
Another (outer) alignment failure story
Ω
paulfchristiano
5y
Ω
39
285
What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Ω
Andrew_Critch
5y
Ω
65
261
The Plan
Ω
johnswentworth
4y
Ω
78
150
Finite Factored Sets
Ω
Scott Garrabrant
4y
Ω
97
133
Selection Theorems: A Program For Understanding Agents
Ω
johnswentworth
4y
Ω
28
161
My research methodology
Ω
paulfchristiano
5y
Ω
38
261
larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
4y
Ω
31
139
Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Ω
So8res
4y
Ω
15
299
EfficientZero: How It Works
Ω
1a3orn
4y
Ω
50
182
Specializing in Problems We Don't Understand
johnswentworth
5y
29