This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Epistemic status: relatively confident in the overall direction of this post, but looking for feedback! TL;DR: When are ML systems well-modeled as coherent expected utility maximizers? We apply our theoretical model...
This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillen, Daniel Kokotajlo, and Lukas Berglund for feedback. Summary Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal...
In Spring 2023, the Berkeley AI Safety Initiative for Students (BASIS) organized an alignment research program for students, drawing inspiration from similar programs by Stanford AI Alignment[1] and OxAI Safety Hub. We brought together 12 researchers from organizations like CHAI, FAR AI, Redwood Research, and Anthropic, and 38 research participants...