Crossposted from my Substack. Intuitively, simpler theories are better all else equal. It also seems like finding a way to justify assigning higher prior probability to simpler theories is one of the more promising ways of approaching the problem of induction. In some places, Solomonoff induction (SI) seems to be...
Crossposted from my Substack. Rational choice theory is commonly thought of as being about what to do in light of our beliefs and preferences. But our beliefs and preferences come from somewhere. I would say that we believe and prefer things for reasons. My evidence gives me reason to believe...
Applications are now open for the Cooperative AI Summer School, which will take place from 9th to 13th July 2025 in Marlow, near London! Designed for students and early-career professionals in AI, computer science, and related disciplines—such as sociology and economics—the summer school offers a firm grounding in the emerging...
In our jobs as AI safety researchers, we think a lot about what it means to have reasonable beliefs and to make good decisions. This matters because we want to understand how powerful AI systems might behave. It also matters because we ourselves need to know how to make good...
Summary Agents might fail to peacefully trade in high-stakes negotiations. Such bargaining failures can have catastrophic consequences, including great power conflicts, and AI flash wars. This post is a distillation of DiGiovanni et al. (2024) (DCM), whose central result is that agents that are sufficiently transparent to each other have...
Which forms of misalignment might result in particularly bad outcomes? And to what extent can we prevent them even if we fail at intent alignment? We define spite as a terminal preference for frustrating others’ preferences, at least under some conditions. Reducing the chances that an AI system is spiteful...
Summary Bounded agents might be unaware of possibilities relevant to their decision-making. That is, they may not just be uncertain, but fail to conceive of some relevant hypotheses entirely. What's more, commitment races might pressure early AGIs into adopting an updateless policy from a position of limited awareness. What happens...