0. Intro Due to Claude’s Constitution and OpenAI’s model spec, the issue of AI character has started getting more attention, particularly concerning whether we want AI systems to be “obedient” or “ethical”.[1] But we think it’s still not nearly enough. AI character (e.g. how obedient, honest, cooperative, or altruistic AIs...
Short summary A moral public good is something many people want to exist for moral reasons—for example, people might value poverty reduction in distant countries or an end to factory farming. If future people care somewhat about moral public goods, but care more about idiosyncratic selfish goods, then there may...
Introduction Forethought and AI Futures Project have highlighted the risk that a malicious actor could use data poisoning to instill secret loyalties into advanced AI systems, and then seize power. This piece gives my view on what ML research could prevent this from happening. Threat model The basic threat model...
This post is speculative and tentative. I’m exploring new ideas and giving my best guess; the conclusions are lightly held. Summary Bostrom (2014) says that an actor has a “decisive strategic advantage” if it obtains “a level of technological and other advantages sufficient to enable it to achieve complete world...
Epistemic status: very rough! Spent a couple of days reading the Gradual Disempowerment paper and thinking about my view on it. Won’t spend longer on this, so am sharing rough notes as is Summary * I won’t summarise the paper here! If you’re not familiar with it, I recommend reading...
AI systems may soon fully automate AI R&D. Myself and Daniel Eth have argued that this could precipitate a software intelligence explosion – a period of rapid AI progress due to AI improving AI algorithms and data. But we never addressed a crucial question: how big would a software intelligence...