Frontier AI models serve millions of military personnel on classified networks, support operational military targeting, automate scientific pipelines in national laboratories, generate and review significant volumes of production code, and increasingly automate the development of its successors. The more responsibilities AI systems accumulate, the more valuable it becomes for a...
We’ve written about why we think AI character — the behaviour of AI systems — will have a massive impact on how well the intelligence explosion goes, and why we think that there would be big benefits to giving AIs proactive prosocial drives — that is, behavioral drives beyond refusals...
Introduction Consider a lorry driver who sees a car crash and pulls over to help, even though it’ll delay his journey. Or a delivery driver who notices that an elderly resident hasn’t collected their post in days, and knocks to check they’re okay. Or a social media company employee who...
0. Intro Due to Claude’s Constitution and OpenAI’s model spec, the issue of AI character has started getting more attention, particularly concerning whether we want AI systems to be “obedient” or “ethical”.[1] But we think it’s still not nearly enough. AI character (e.g. how obedient, honest, cooperative, or altruistic AIs...
Short summary A moral public good is something many people want to exist for moral reasons—for example, people might value poverty reduction in distant countries or an end to factory farming. If future people care somewhat about moral public goods, but care more about idiosyncratic selfish goods, then there may...
Introduction Forethought and AI Futures Project have highlighted the risk that a malicious actor could use data poisoning to instill secret loyalties into advanced AI systems, and then seize power. This piece gives my view on what ML research could prevent this from happening. Threat model The basic threat model...