Context: I wrote the two memos below in mid-2024, while still at OpenAI. They were intended to convey some core aspects of misalignment threat models to OpenAI researchers. When I left in late 2024, I got permission to "take them with me", but didn't get around to posting them until...
This post contains some rough reflections on the alignment community trying to make its ontology legible to the mainstream ML community, and the lessons we should take from that experience. Historically, it was difficult for the alignment community to engage with the ML community because the alignment community was using...
Three theories of higher education Getting an undergraduate degree is very costly. In America, the direct financial cost of attending a private university is typically in the hundreds of thousands of dollars. Even when tuition is cheap (or covered by scholarships), forgoing three to four years of salary and career...
Which alignment target? Suppose you’re an AI company or government, and you want to figure out what values to align your AI to. Here are three options, and some of their downsides: AIs that are aligned to a set of consequentialist values are incentivized to acquire power to pursue those...
Much of my thinking over the last year has focusing on understanding the concept of "distributed agents", as opposed to the "centralized agents" that the existing paradigm of expected utility maximization describes. One way of describing the difference is in terms of how autonomous their subagents are. Another is that...
I'd like to reframe our understanding of the goals of intelligent agents to be in terms of goal-models rather than utility functions. By a goal-model I mean the same type of thing as a world-model, only representing how you want the world to be, not how you think the world...
It’s been eight months since I released my last story, so you could be forgiven for thinking that I’d given up on writing fiction. In fact, it’s the opposite. I’m excited to announce that I’m releasing my first fiction collection—The Gentle Romance: Stories of AI and Humanity—with Encour Press in...