x

LESSWRONG

LW

A Moderate Update to your Artificial Priors — LessWrong

A Moderate Update to your Artificial Priors

Jan 03, 2024 by habryka

231ARC's first technical report: Eliciting Latent Knowledge

paulfchristiano, Mark Xu, Ajeya Cotra

5y

90

232Fun with +12 OOMs of Compute

Daniel Kokotajlo

5y

86

631What 2026 looks like

Daniel Kokotajlo

5y

174

266Ngo and Yudkowsky on alignment difficulty

Eliezer Yudkowsky, Richard_Ngo

5y

153

254Another (outer) alignment failure story

paulfchristiano

5y

39

299What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)

5y

65

5y

78

157Finite Factored Sets

Scott Garrabrant

5y

97

136Selection Theorems: A Program For Understanding Agents

5y

29

162My research methodology

paulfchristiano

5y

38

261larger language models may disappoint you [or, an eternally unfinished draft]

5y

31

139Comments on Carlsmith's “Is power-seeking AI an existential risk?”

5y

15

307EfficientZero: How It Works

5y

50

190Specializing in Problems We Don't Understand

5y

29