TLDR: In a new paper, we explore whether we could train future LLMs to accurately answer questions about themselves. If this works, LLM self-reports may help us test them for morally relevant states like consciousness. We think it's possible to start preliminary experiments testing for moral status in language models...
I spoke about sentience and related issues in AI systems with Luisa Rodriguez on the 80,000 Hours podcast. Twitter thread of highlights here. I think these issues are important and neglected, especially as many people are and will be interacting with powerful AI systems that give off the strong impression...
[cross-posted from Experience Machines] What does Bing Chat, also known by its secret name Sydney, have to say about itself? In deranged rants that took the internet by storm and are taking AI safety mainstream, the blatantly misaligned language model displays a bewildering variety of disturbing self-conceptions: despair and confusion...
[crossposted at EA Forum and Experience Machines; twitter thread summary] What is it like to be DALL-E 2? Are today’s AI systems consciously experiencing anything as they generate pictures of teddy bears on the moon, explain jokes, and suggest terrifying new nerve agents? This post gives a list of open...
The purpose of this post is to sketch some ways that Brain Computer Interface (BCI) technology might help with various AI alignment techniques. Roughly, we can divide the strategic relevance of BCI technology into three broad categories.[1] 1. Enhancement. BCI technology could enhance human intelligence - for example by providing...
I am working on a talk in which I will try to present the strongest possible case that, by the lights of some plausible and widely accepted scientific theory (or theories) of consciousness [1], some current AI system is phenomenally conscious. I am curious what work like this, if any,...