In this post I’ll sketch out an informal model of intelligent agents as webs of beliefs (or belief webs for short). The belief webs framework pulls together ideas from active inference, agent foundations and machine learning. In doing so it aims to unify beliefs, goals and actions as three facets...
In my post “Why I’m not a Bayesian”, I argued that the Bayesian approach of assigning credences to propositions with binary truth values only works in simple and restricted domains. Instead, I claimed, a better approach to epistemology is to assign degrees of truth to models of the world. This...
Many people in my intellectual circles use economic abstractions as one of their main tools for reasoning about the world. However, this often leads them to overlook how interventions which promote economic efficiency undermine people’s ability to maintain sociopolitical autonomy. By “autonomy” I roughly mean a lack of reliance on...
Context: I wrote the two memos below in mid-2024, while still at OpenAI. They were intended to convey some core aspects of misalignment threat models to OpenAI researchers. When I left in late 2024, I got permission to "take them with me", but didn't get around to posting them until...
This post contains some rough reflections on the alignment community trying to make its ontology legible to the mainstream ML community, and the lessons we should take from that experience. Historically, it was difficult for the alignment community to engage with the ML community because the alignment community was using...
Three theories of higher education Getting an undergraduate degree is very costly. In America, the direct financial cost of attending a private university is typically in the hundreds of thousands of dollars. Even when tuition is cheap (or covered by scholarships), forgoing three to four years of salary and career...
Which alignment target? Suppose you’re an AI company or government, and you want to figure out what values to align your AI to. Here are three options, and some of their downsides: AIs that are aligned to a set of consequentialist values are incentivized to acquire power to pursue those...