LESSWRONG
LW

74
Best of Alignment Forum IV

Best of Alignment Forum IV

Aug 17, 2023 by Raemon
228ARC's first technical report: Eliciting Latent Knowledge
Ω
paulfchristiano, Mark Xu, Ajeya Cotra
4y
Ω
90
229Fun with +12 OOMs of Compute
Ω
Daniel Kokotajlo
5y
Ω
86
261Ngo and Yudkowsky on alignment difficulty
Ω
Eliezer Yudkowsky, Richard_Ngo
4y
Ω
152
250Another (outer) alignment failure story
Ω
paulfchristiano
5y
Ω
39
285What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Ω
Andrew_Critch
5y
Ω
65
261The Plan
Ω
johnswentworth
4y
Ω
78
149Finite Factored Sets
Ω
Scott Garrabrant
4y
Ω
97
133Selection Theorems: A Program For Understanding Agents
Ω
johnswentworth
4y
Ω
28
161My research methodology
Ω
paulfchristiano
5y
Ω
38
261larger language models may disappoint you [or, an eternally unfinished draft]
Ω
nostalgebraist
4y
Ω
31
139Comments on Carlsmith's “Is power-seeking AI an existential risk?”
Ω
So8res
4y
Ω
15
299EfficientZero: How It Works
Ω
1a3orn
4y
Ω
50
167Worst-case thinking in AI alignment
Ω
Buck
4y
Ω
18