2mo

Welcome to the first edition of the Digital Minds Newsletter, collating all the latest news and research on digital minds, AI consciousness, and moral status.

Our aim is to help you stay on top of the most important developments in this emerging field. In each issue, we will share a curated overview of key research papers, organizational updates, funding calls, public debates, media coverage, and events related to digital minds. We want this to be useful for people already working on digital minds as well as newcomers to the topic.

This first issue looks back at 2025 and reviews developments relevant to digital minds. We plan to release multiple editions per year.

If you find... (read 6043 more words →)

AI self-replication roundup

tbs

2mo

I’ve been reading and thinking about AI self-replication for an AI safety project I’ve been working on with a collaborator. I think the topic is little explored, plausibly important given the aim of reducing catastrophic risks posed by AI development, fascinating, and ripe for empirical investigation. This post rounds up some of my notes and thoughts on the topic.

1. How is AI self-replication relevant to digital minds?

There are at least five ways.

First, AI agents could self-replicate to gain power, which they could use to harm both humans and digital minds. Like invasive species, these AI agents wouldn’t need to have malicious intent in order to cause harm. They could instead cause harm... (read 5255 more words →)

Preference gaps as a safeguard against AI self-replication

tbs

tbs, Elliott Thornley (EJT)

3mo

Executive summary

AI self-replication is an emerging risk. We:
- provide background on it.
- survey recent work, and
- explain how it interacts with other risks from advanced AI.
We propose a safeguard against AI self-replication: train agents to have preferences only between outcomes with the same number of copies of themselves.
- This proposal takes inspiration from Elliott’s (2025) POST-Agents Proposal.
After introducing our proposal, we:
- explain why AI agents with these preferences won’t self-replicate if doing so is costly with respect to the lotteries that they get conditional on each number-of-copies,
- explain why we think training agents to have these preferences is likely easier than training agents to be fully aligned or reliably averse to self-replication,
- give reasons to think that our proposed

... (read 3097 more words →)

Highlights from our digital minds forecasting survey

tbs

5mo

We recently released “Futures with Digital Minds: Expert Forecasts in 2025”, which asked 67 experts with relevant expertise about whether, when, and how digital minds might be created. We defined digital minds as computer based systems with the capacity for subjective experience.

The report contains a summary of key findings. We share personal reflections and highlights on the report in our companion blog post.

LESSWRONG
LW

LESSWRONG
LW

tbs

Digital Minds in 2025: A Year in Review

Preference gaps as a safeguard against AI self-replication

AI self-replication roundup

Highlights from our digital minds forecasting survey

tbs

tbs

Digital Minds in 2025: A Year in Review

AI self-replication roundup

Preference gaps as a safeguard against AI self-replication

Highlights from our digital minds forecasting survey

tbs

Digital Minds in 2025: A Year in Review

Preference gaps as a safeguard against AI self-replication

AI self-replication roundup

Highlights from our digital minds forecasting survey

tbs

tbs

Digital Minds in 2025: A Year in Review

AI self-replication roundup

Preference gaps as a safeguard against AI self-replication

Highlights from our digital minds forecasting survey

1. How is AI self-replication relevant to digital minds?

Executive summary