x

LESSWRONG
LW

Best of Alignment Forum IV — LessWrong

Best of Alignment Forum IV

Aug 17, 2023 by Raemon

230ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano, Mark Xu, Ajeya Cotra

4y

90

231Fun with +12 OOMs of Compute

Daniel Kokotajlo

5y

86

266Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky, Richard_Ngo

4y

152

254Another (outer) alignment failure story

paulfchristiano

5y

39

290What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

5y

65

4y

78

155Finite Factored Sets

Scott Garrabrant

5y

97

135Selection Theorems: A Program For Understanding Agents

5y

28

162My research methodology

paulfchristiano

5y

38

261larger language models may disappoint you [or, an eternally unfinished draft]

4y

31

139Comments on Carlsmith's “Is power-seeking AI an existential risk?”

4y

15

301EfficientZero: How It Works

4y

50

167Worst-case thinking in AI alignment

4y

18