Standard AI are optimizers: they ‘look’ through possible actions they could take, and pick the one that maximises what they care about. This can be dangerous— an AI which maximises in this way needs to care about exactly the same things that humans care about, which is really hard[1]. If...
Some claim that large AI companies like OpenAI aren’t doing enough to make AI safer, or are causing harm by speeding up AI progress. But what are OpenAI doing to make AI safer? In general, OpenAI is very optimistic about using AI to help with alignment research. Their basic approach...
There are a couple of explanations of mesa-optimization available. I think Rob Miles' video on the topic is excellent, but I think existing written descriptions don't make the concept simple enough to be understood thoroughly by a broad audience. This is my attempt at doing that, for those who prefer...
I'd like to distill AI Safety posts and papers, and I'd like to see more distillations generally. Ideally, posts and papers would meet the following criteria: * Potentially high-impact for more people to understand * Uses a lot of jargon or is generally complex and difficult to understand * Not...
Many thanks to Jan for commenting on a draft of this post. There were a lot of great comments on "Let's see you write that corrigibility tag". This is my attempt at expanding Jan Kulveit's comment[1], because I thought it was useful, and should be read more widely. This post...
Polis is a surveying platform designed for finding clusters of people with similar opinions on a topic. Participants submit short text statements(<140 characters) which are sent out semi-randomly[1] to other participants to vote on by clicking agree, disagree or pass. This post provides links to video and text tutorials for...
See also: List of Tools for Collaborative Truth Seeking Squiggle is a special-purpose programming language for generating probability distributions, estimates of variables over time, and similar tasks, with reasonable transparency. It was developed by the Quantified Uncertainty Research Institute. It's a broad tool applicable to a lot of tasks with...