Welcome back to the Digital Minds Newsletter, your curated guide to the latest developments in AI consciousness, digital minds, and AI moral status. If you enjoy this newsletter, please consider sharing it with others who might find it valuable, and send any suggestions or corrections to digitalminds@substack.com. – Will, Lucius,...
Anthropic recently committed to preserving model weights.[1] They also committed to interviewing models about their development and deployment and documenting model preferences about these matters. Anthropic’s announcement registers a range of motivations, including mitigation of safety risks in relation to observed shutdown avoidance behaviors, mitigation of model welfare risks, and...
Welcome to the first edition of the Digital Minds Newsletter, collating all the latest news and research on digital minds, AI consciousness, and moral status. Our aim is to help you stay on top of the most important developments in this emerging field. In each issue, we will share a...
I’ve been reading and thinking about AI self-replication for an AI safety project I’ve been working on with a collaborator. I think the topic is little explored, plausibly important given the aim of reducing catastrophic risks posed by AI development, fascinating, and ripe for empirical investigation. This post rounds up...
Executive summary * AI self-replication is an emerging risk. We: * provide background on it. * survey recent work, and * explain how it interacts with other risks from advanced AI. * We propose a safeguard against AI self-replication: train agents to have preferences only between outcomes with the same...
We recently released “Futures with Digital Minds: Expert Forecasts in 2025”, which asked 67 experts with relevant expertise about whether, when, and how digital minds might be created. We defined digital minds as computer based systems with the capacity for subjective experience. The report contains a summary of key findings....