Thanks for feedback on versions of this essay from Seth Baum, Tobias Baumann, Max Carpendale, Michael Chen, Michael Dello-Iacovo, Zach Freitas-Groff, Oscar Horta, Roman Leventov, David Manheim, Caleb Ontiveros, Janet V. T. Pauketat, Sean Richardson, Brad Saad, Yip Fai Tse, Earl J. Wagner, Jamie Woodhouse, and Miranda Zhang.

Image: Midjourney v4, "a small digital brain etched on a computer chip floating on a pool of dark blue liquid, 4k, ultrarealism"

ChatGPT, Sydney, LaMDA, and other large language models (LLMs) have inspired a new wave of interest in artificial intelligence (AI). Among the many questions this technology raises, people are vigorously pondering if these LLMs are not merely intelligent (i.e., high problem-solving ability) but are digital minds with mental faculties such as emotionautonomysocializationmeaninglanguage, and self-awareness.

Digital minds may be a key factor in the way AI develops over the coming years, such that a better understanding may lead to more accurate forecasts, better strategic prioritization, and concrete strategies for AI safety. Here, I briefly scope out some key questions in this area that might be worth further research. This may be a promising way for social scientists to contribute to AI safety (see “AI Safety Needs Social Scientists”), though these questions also invoke philosophy, computer science, and other perspectives.

What digital minds will exist in short- and long-term futures?

There are many plausible ways for digital minds to emerge relatively soon in AIs such as the new wave of LLMs: ChatGPTSydneyBardClaude, and Sparrow. AIs already have reinforcement learning from human feedback (RLHF)program synthesis, and the ability to call APIs, and the way minds emerge may depend on myriad social factors such as the different cultures and reputations of AI engineering teams across tech giants, start-ups, universities, and governments. Already in 2022, a few engineers saw neural nets as “sentient” and said they “may be … slightly conscious.” In the long run, there are many advantages to being a digital mind that could lead to their ubiquity, such as a better ability to copy and modify oneself.

Forecasting which trajectories are most likely may draw from biologycomputer sciencecognitive scienceeconomics, and other fields, as well as research directly interpreting and explaining current AIs. For example, without reinforcement learning, AIs may not have a multi-time-step reasoning that is necessary for some richer mental faculties, such as goal-directed behavior. On the other hand, given the marketability of LLMs, corporations may drive that technological progress most quickly, and the incentive to have AIs that can predict human behavior (e.g., personal assistants) may quickly lead to digital minds. These different sorts of digital minds may each require different approaches to alignment and existential safety. For example, digital minds with a fundamental capacity for linguistic meaning may be able to interpret human instructions and bootstrap their way into aligned outcomes with RLHF, while highly agentic minds with limited faculties of language and meaning may require more hard-coded alignment, such as neural causal models or agent foundations. These are just some concrete examples of many factors that could drive forecasts related to digital minds, which could complement forecasting of economic trends by MIT FutureTech, compute trends by Epoch, and Metaculus prediction markets.

How will humans react to digital minds?

Reactions to actual or perceived digital minds may play an important role in existential risks from AI. People are already reacting strongly to those dialogue-based LLMs. Mind attribution may effect a rapid increase in AI investments, a shift in prioritization among AI architectures, a Ludditic backlash against AI progress, campaigns to protect the interests of AI, existential reflection among humans, etc. Human reactions have been the focus of most Sentience Institute research to date, such as the ongoing Artificial Intelligence, Morality, and Sentience (AIMS) nationally representative survey of US public attitudes, a survey of predictors of moral consideration of AIs in Computers in Human Behavior, and preprints of experiments on perspective-taking and the effect of different features (e.g., embodiment, emotion expression) on moral consideration.

Further research should draw on cognitive and social psychology as well as macro social science across political, media, and social movement theory. For example, mind perception theory suggests that attribution of agentic mental capacity is associated with attribution of moral agency—the ability to take moral action, to create benefit and harm—while experience is associated with moral patiency (i.e., moral circle expansion); thus, humans may react to more agentic minds with more concern and resistance. Similarly, studies in human-robot interaction suggest that autonomyhumanness, and outperformance make humans feel more threatened. One particularly important dimension may be the extent to which advanced AI systems appear to be aligned, such as through RLHF, even if the underlying model is not viewed as safe and aligned by well-informed researchers. At the macro scale, the history of how historical social movements emerge and succeed can evidence how collective action may occur following the recognition of digital minds, particularly the presence of mental faculties in a technological artifact, may create a trajectory that recombines features of social movements and other emerging technologies. Research can also address the extent of value lock-in with various reflection and alignment mechanisms, and thus how and to what extent we should ensure moral progress before takeoff.

What is the philosophical nature of digital minds?

Arguably, most digital minds research to date has been in a cluster of philosophy of mind, ethics, and law—arguably since the 1920 science fiction play R.U.R. raised questions of robot rights. The first usage of the term “digital minds” in this context that I know of was Ben Goertzel’s The Hidden Pattern (2006), which presents a patternist theory of how biological and digital minds work. Much of this literature centers mental faculties, such as consciousness, following the ethics literature on human and nonhuman animals, as well as relational theories that emphasize the social connections we have to AIs, most notably in Robot Rights (Gunkel 2018). We survey the ethics literature, broadly construed, in Harris and Anthis (2021). Our recent paper, Ladak (2023) in AI and Ethics, catalogs nine criteria that have been proposed for moral standing, emphasizing non-sentient AIs who arguably still have preferences and goals and thus may warrant moral consideration. Digital minds may have very different values and preferences from humans, such as less emphasis on self-preservation and mortal urgency, and we have cataloged some relevant features for assessing sentience in artificial entities.

Properly including AIs in the moral circle could improve human-AI relations, reduce human-AI conflict, and reduce the likelihood of human extinction from rogue AI. Moral circle expansion to include the interests of digital minds could facilitate better relations between a nascent AGI and its creators, such that the AGI is more likely to follow instructions and the various optimizers involved in AGI-building are more likely to be aligned with each other. Empirically and theoretically, it seems very challenging to robustly align systems that have an exclusionary relationship such as oppression, abuse, cruelty, or slavery.

A number of philosophers and scientists have written about whether artificial consciousness is possible. In a recent review, we find an affirmative consensus, though division remains on its plausibility, especially due to differences between computational, physical, and biological approaches. Progress on this question may be largely tied up in broader challenges in a philosophical understanding of consciousness, such as resolving debates between realism and eliminativism, and the developments of scientific theories such as global workspace and attention schema. David Chalmers has recently taken up the question of consciousness in language models, covering features such as memory and global workspaces, and philosophy of mind seems to be the main topic for digital minds group at FHI, such as Shulman and Bostrom (2020) on superbeneficiaries—though Bostrom and Shulman’s “Propositions Concerning Digital Minds and Society” (2022) covers a wider range of topics—as well as for MEPCFMGCRI, and most other academic groups in this nascent field.

How will digital minds interact, and what kind of society will that create?

Given individual differences between digital and biological minds, there may be radically different social dynamics, such as the wide range of predictions made in The Age of Em (Hanson 2016) for a world of whole brain emulations and the conceptual analysis of simulations in Reality+ (Chalmers 2022). Further work could focus on economic, organizational, or political dynamics, such as trade-offs between AI augmenting or automating human labor, “super democratized” access to multiple AGIs, or multipolar takeoff scenarios in which transformative AIs cooperate or conflict. This includes a number of ways in which the long-term future could go very well or very poorly.

The consideration of mental faculties other than intelligence may complicate a variety of social theories, such as: game theory with fewer bounds on rationality and cooperation; decision theory with access to other agents’ source code (a la functional decision theory); social choice theory if individuals can quickly copy themselves; organizational and economic theory if the locus of agency is no longer individuals but groups of (near-)copies; and legal and rights theories with digital environments that are easier to surveil, manipulate, and abuse than analog environments. The social dynamics of digital minds, in groups of exclusively digital minds and in hybrid biological-digital groups (e.g., the first digital minds and their human creators), could precipitate very rapid increases in AI capabilities and lead to radically different takeoff scenarios and long-term futures.

What strategies are most promising for improving futures with digital minds?

All of the foundational research outlined above needs to ultimately be cashed out in better strategies for building the best future for all sentient beings. One tentative strategic claim is that research should be prioritized before other projects such as public policy or outreach. First impressions may be very important for digital minds as with other technosocial issues (e.g., lock-in of GMO and nuclear energy narratives), and there has been so little research on this topic that the most promising outreach strategies could easily change after only a few research projects. Before promoting a narrative or policy goal, such as a moratorium on digital consciousness, we should consider its direct viability and indirect effects of its promotion.

Delay should not be too long, however, because suboptimal narratives may take over in the meantime, especially with short timelines—making digital minds research a highly time-sensitive AI safety project. Discussion to date has arguably been largely confused and quite possibly detrimental. The most promising work informed by digital minds research may be preparation to push forcefully for certain technical and governance strategies during major advances in AI capabilities.


What makes this different from other AI existential safety research?

That’s a tough question. As with most research clusters, distinctions are more in focus, methods, and family resemblance than crisp conceptual boundaries. In practice, this agenda is more oriented towards social science and towards complex, high-level topics than paradigms such as mechanistic interpretability and agent foundations. For example, rather than approaching “agency” with bottom-up infra-Bayesianism, DM research would tend towards social models of agency—how agents work together and attribute agency to each other—and utilize social science rather than mathematical tools.

Also, many AI safety research agendas have a safe AI architecture in mind (e.g., IDA), but DM does not. It is better viewed as a question-centric agenda (e.g., “How can we elicit latent knowledge?”) or a specific stream of forecasting (i.e., marginal and conditional probabilities of particular AI futures). And because it is a high-level approach not specific to an architecture or technical implementation, it may be more robustly useful across possible AI takeoff scenarios, at least those in which AGI has a richer mental life—though by the same token, it may lack the sharp contribution to specific AI takeoff scenarios.

How fixed is this agenda?

Not at all. There are some exciting individual projects already happening in this nascent field, and now the idea is to zoom out and write this research agenda to figure out which orientations, questions, and clusters of projects seem most promising—if any. We will continue to iterate between concrete big-picture planning and concrete progress on the most promising individual projects, but overall, this new field seems very promising at the moment.

How can I work on digital minds (DM) research?

This work is now our primary focus at the Sentience Institute. We have substantial room for more funding; we are eager to establish a network of DM research collaborators and advisors (get in touch); and we are accepting researcher applications on a rolling basis! Some other research groups of particular relevance that you could work with are the FHI digital minds groupMila-FHI-UM digital minds projectNYU Mind, Ethics, and Policy programFAU Center for the Future MindGlobal Catastrophic Risk InstituteLegal Priorities ProjectCenter for Reducing Suffering, and Center on Long-Term Risk


New Comment

New to LessWrong?