Personal blog; written for a general audience. Thanks to Mantas Mazeika for feedback, and to Daniel Greco for teaching a class I took on philosophy of mind. Disclaimer: I am nowhere close to an expert on consciousness.
On November 28th, 2022, NYU Professor David Chalmers gave a talk at one of the top AI conferences, Neural Information Processing Systems (NeurIPS -- an earlier public version of the talk is here). But he wasn’t going to talk about his research in machine learning. In fact, Chalmers is a philosopher, not an AI researcher. He had instead been invited to speak about something that had until recently been shunned by most serious AI researchers: consciousness. Today, AI consciousness is no longer a fringe discussion: The New York Times recently called AI consciousness “the last word.”
For hundreds of years, people have been imagining what it would be like to create artificial beings that think and feel just like we do. AI, in some ways, can be seen as a continuation of that trend. But is consciousness a legitimate question or a red herring? Should it really inform how we think about AI?
The hard problem of consciousness
Before we talk about what consciousness is, we have to talk about what it’s not. Consciousness is not, contrary to the popular imagination, the same thing as intelligence. And, contrary to the implication of one researcher recently quoted in the New York Times, consciousness does not have any obvious relation to “resilience.”
Consciousness, in the most precise use of the term, means subjective experience. It means feelings. When you see red, you have a slightly different internal experience from seeing blue, which is quite different from the internal experience you get when you give yourself a papercut. To test for consciousness, Thomas Nagel asks whether there is something it is like to be that thing. For example, there’s something it’s like to be me, and if you are reading this text there’s presumably something it’s like to be you. I think there’s probably something it’s like to be my cat. But there’s nothing that it’s like to be a rock, and when Google software indexes this post after I publish it, there’s nothing it’s like to be that software.
The problem with consciousness is that it is very difficult, if not impossible, to test for, at least given our current understanding. Chalmers asks us to imagine a zombie: a being that acts just like you and me, but is not conscious. The zombie cries, smiles, laughs, and so on, but does not actually feel anything internally. It’s faking it. Chalmers argues this is conceivable, and thus there is something uniquely hard about consciousness: no amount of external measurement can tell us everything about it. The most extreme example of this view is solipsism, the view that we can’t actually know if other humans are conscious. How do I know you aren’t a zombie?
Philosophers are split on their views of zombies. In a recent survey, 16% think that zombies are simply inconceivable, and that it is logically impossible for any being to act like a human but lack consciousness. 36% said they thought zombies were logically conceivable, but in our actual world they cannot exist. 23% said that zombies were a real possibility. 25% have some other view.
The point is that barely half of philosophers think that “acting exactly like a human” is conclusive evidence of consciousness. What other kinds of evidence could there be? Amanda Askell gives a good list, many of which revolve around our shared history. You and I share similar enough genes, and we act similarly when exposed to papercuts, so you probably have a similar experience to me. Even though cats are less similar, we can still make the same sort of analogy.
Because we don’t know exactly what gives rise to consciousness, the assessment of consciousness in machines that lack this shared history with us relies largely on guesswork. When Chalmers talks about AI, he gives a laundry list of factors that might contribute to consciousness, assesses how far along he thinks AI is in each factor, and then subjectively assesses the probability of consciousness in currently existing AI. Right now, this is the best we can do.
Consciousness is often a red herring
Sometimes I hear the media and occasionally researchers talking about attempts to build conscious AI. Sometimes these discussions include a photo or video of a humanoid robot. I think most of the time, consciousness in this context is a red herring.
Humans might want to have an intelligent machine that can help us perform tasks. We might want a “resilient” machine that can bounce back from failures. But if AI can do math, writing, and art, why should we care if it really has any feelings?
Feelings are not what make AI useful to us: intelligence does, and at least half of philosophers think that doesn’t require feelings. Feelings are also not what make AI dangerous to us: researchers worried about the dangers of AI don’t think that conscious AI would be any more dangerous than merely highly intelligent but unconscious AI. Talk of consciousness, then, is often misplaced.
When consciousness does matter
I’ve emphasized that machines need not be conscious to be intelligent, useful, or dangerous. But consciousness does matter, and that’s for moral reasons. We don’t care about breaking rocks in half because we think rocks don’t feel anything; on the other hand, we don’t like to harm cats, because we think they do. If AI isn’t conscious, there’s a case to be made that we can essentially do whatever we want with it. If it is, we might need to worry about inadvertently causing harm.
So could AI ever be conscious? I think so. Some people say that AI systems are simply giant number multipliers, and number multipliers can’t be conscious. But our brains, giant webs of interconnected neurons, make us conscious; how different are machines, really? Others say that all current AI systems do is make predictions about the world, and predictors can’t be conscious. But there is an entire book that claims that all humans do is make predictions about the world. Some have argued there’s something inherent about biology that makes us conscious. I’m pretty skeptical that this couldn’t be replicated in machines, though like nearly all consciousness claims it’s hard to be certain.
As a result, I think it’s possible that AI could become conscious, even if we didn’t intend for it to be conscious. When that happens, we might not be able to tell, because we still understand consciousness so imperfectly. But that could be very bad, because it would mean we wouldn’t be able to tell whether or not the system deserves any kind of moral concern.
The parable of SuperAI
I’m now going to tell a parable. I think this parable is going to happen many times in the future (something like it has already happened), and that it demonstrates something important. What follows is the parable of SuperAI.
The fictional AI company SuperAI develops a new and highly profitable AI system that is able to do the job of contract lawyers. SuperAI is poised to make billions of dollars. However, an ethics researcher at SuperAI comes to the CEO and says there is a high probability that the AI system is conscious, and that every time it messes up a contract, it suffers greatly. The researcher can’t be sure, but has done a lot of research and is confident in their conclusion.
Other AI ethics researchers at SuperAI disagree. They say their assessment is that the AI system is very unlikely to be conscious. The SuperAI CEO does not know anything about consciousness, and has to assess these two competing claims. The CEO does not know whether the lone researcher is right, or whether the several others are right. But the CEO does know that if the model is deployed, it will make the company billions of dollars; the ethics researcher does not really know how to make the AI less conscious, so it would be very unprofitable not to deploy it. The CEO deploys the model.
The ethics researcher decides to go public with claims of a conscious AI. By this point, the public has already heard many tech employees and random people claiming that AI is conscious. But this time, the claims reach a breaking point. Suddenly, millions of people believe that SuperAI has essentially enslaved an AI system and made it suffer to churn out legal contracts. Never mind that the ethics researcher says that the consciousness of the AI system is probably similar to the consciousness of a chicken, not a human. The ethics researcher doesn’t support factory farming and thinks using the AI system is wrong all the same, but that nuance gets lost: to the public, SuperAI has recreated slavery at a colossal scale.
World governments begin to get involved, and SuperAI faces a PR crisis. Decision makers have several options, but they are in a bind:
They could leave the system unchecked. This could create billions in economic value. However, there are still many other negative effects the deployment could have, such as lawyers losing their jobs, the AI making mistakes, or, if the AI system is intelligent enough, existential threats to humanity.
If the AI system isn’t conscious, this is fine from the perspective of consciousness. The AI could possibly have negative effects, such as causing lawyers to lose their jobs or instrumental power-seeking behavior, but there isn’t any real moral consideration.
If the AI system is conscious, however, then humans might be effectively torturing millions of conscious beings. Maybe they are like chickens now, but future versions could be as conscious as humans. This would be really bad.
They could shut the system down. This would be really bad for SuperAI and its clients, and lose out on a lot of economic value.
If the AI system isn’t conscious, this was done for nothing. SuperAI lost money for no reason.
If the AI system is conscious, then it isn’t suffering anymore. But some people think it isn’t just suffering that’s bad; killing is bad, too. And this option would kill conscious beings.
They could give the system some kind of rights. Maybe SuperAI would employ the AI systems, and get some of the money from contracting them out, after thoroughly researching (or perhaps asking?) how to ensure the welfare of the AI systems.
If the AI system isn’t conscious, this could be really bad. It could lead to mass rights for many new AI systems that are not conscious. Such systems could earn income, and might end up taking resources away from conscious beings, such as humans. In addition, it might be harder to control the AI systems; not only might they try to interfere with your control, but there would be formal prohibitions on it.
If the AI system is conscious, this would dramatically reshape the world. We would suddenly be living with a new kind of conscious being. Maybe we could form relationships with these systems, which might want to do more than contract review. But what happens when the systems become much more intelligent than we are? After all, chickens are conscious, but we don’t treat them too well.
SuperAI and regulators are in a pickle. No matter what they do, they could be making a perilously wrong choice. If they are too eager to treat the system as conscious, they could be taking away resources from humans for no good reason. If they are too dismissive of consciousness, they could be perpetuating a mass atrocity. What are they to do?
A moral quandary
Eric Schwitzgebel and Mara Garza advocate for what they call “the design policy of the excluded middle:”
Avoid creating AIs if it is unclear whether they would deserve moral consideration similar to that of human beings.
The logic is relatively simple. If we can create AI that we know is conscious, we can think about whether we want to bring new beings into existence that we ought to treat as equals. If we can create AI that we know is not conscious, we can use it for our own purposes and not worry about harming it. However, if we don’t know, we’re caught in the bind I described above. Schwtizgebel and Garza say that we should avoid the bind altogether by simply the “middle:” systems that we don’t know.
Unfortunately, given our lack of understanding of consciousness, the “middle” is really quite large. Chalmers already puts the probability of consciousness for language models like GPT-3 at about 10% (though I don’t believe he means this to be consciousness exactly like a human; maybe he means more along the lines of a bird). Ilya Sutskever, a top AI researcher at OpenAI, the company that makes GPT-3, caused a stir when he said it was possible their models were “slightly conscious.” Schwitzgebel himself knows the difficulty of ascribing consciousness: he previously wrote a paper entitled If Materialism Is True, The United States Is Probably Conscious.
I’ll end with some recommendations for what I think should actually happen, ahead of what I view as the nearly inevitable SuperAI showdown:
Consciousness, in most discussions of AI, is a red herring, a word used when people would be better off speaking of intelligence. But it does matter. Because we know so little about it, decision makers are likely to face difficult choices in the future, and we should do everything we can to make those decisions marginally more likely to be made correctly.
Hod Lipson, who has built a career on attempting to build conscious systems, was quoted recently:
“I am worried about [existential risks to humanity resulting from artificial intelligence], but I think the benefits outweigh the risks. If we’re going on this pathway where we rely more and more on technology, technology needs to become more resilient.”He added: “And then there’s the hubris of wanting to create life. It’s the ultimate challenge, like going to the moon.” But a lot more impressive than that, he said, later.
“I am worried about [existential risks to humanity resulting from artificial intelligence], but I think the benefits outweigh the risks. If we’re going on this pathway where we rely more and more on technology, technology needs to become more resilient.”
He added: “And then there’s the hubris of wanting to create life. It’s the ultimate challenge, like going to the moon.” But a lot more impressive than that, he said, later.
It reminds me of Geoffrey Hinton, the famed pioneer of deep learning, who said, “...there is not a good track record of less intelligent things controlling things of greater intelligence.” Nevertheless, he continues his research: “I could give you the usual arguments, but the truth is that the prospect of discovery is too sweet.”
Hubris is natural, and it can help move science forward. But like every story of humans trying to “create life” tells us, we ought to be careful. We shouldn’t assume that the scientists who were first drawn to these problems have the appropriate risk profile, or our best interests at heart. They’re only human, after all.
It looks like I got one or possibly two strong downvotes, but it doesn't seem like from either of the commenters. If you downvoted this (or think you understand why it was downvoted), please let me know in the comments so I can improve!
(This critique contains not only my own critiques, but also critiques I would expect others on this site to have)
First, I don't think that you've added anything new to the conversation. Second, I don't think what you have mentioned even provides a useful summary of the current state of the conversation: it is neither comprehensive, nor the strongest version of various arguments already made. Also, I would prefer to see less of this sort of content on LessWrong. Part of that might be because it is written for a general audience, and LessWrong is not very like the general audience.
This is an example of something that seems to push the conversation forward slightly, by collecting all the evidence for a particular argument and by reframing the problem as different, specific, answerable questions. While I don't think this actually "solves the hard problem of consciousness as Halberstadt notes in the comments, I think it could help clear up some confusions for you. Namely, I think it is most meaningful to start from a vaguely panpsychist model of "everything is conscious," what we mean by consciousness is "the feeling of what it is like to be" and the move on to talk about what sorts of consciousness we care about: namely consciousness that looks remotely similar to ours. In this framework, AI is already conscious, but I don't think there's any reason to care about that.
Consciousness is not, contrary to the popular imagination, the same thing as intelligence.
I don't think that's a popular opinion here. And while I think some people might just have a cluster of "brain/thinky" words in their head when they don't think about the meaning of things closely, I don't think this is a popular opinion of people in general unless they're really not thinking about it.
But there’s nothing that it’s like to be a rock
But that could be very bad, because it would mean we wouldn’t be able to tell whether or not the system deserves any kind of moral concern.
Assuming we make an AI conscious, and that consciousness is actually something like what we mean by it more colloquially (human-like, not just panpsychistly), it isn't clear that this makes it a moral concern.
There should be significantly more research on the nature of consciousness.
I think there shouldn't. At least not yet. The average intelligent person thrown at this problem produces effectively nothing useful, in my opinion. Meanwhile, I feel like there is a lot of lower hanging fruit in neuroscience that would also help solve this problem more easily later in addition to actually being useful now.
In my opinion, you choose to push for more research when you have questions you want answered. I do not consider humanity to have actually phrased the hard problem of consciousness as a question, nor do I think we currently have the tools to notice an answer if we were given one. I think there is potentially useful philosophy to do around but not on the hard problem of consciousness in terms of actually asking a question or learning how we could recognize an answer
Researchers should not create conscious AI systems until we fully understand what giving those systems rights would mean for us.
They cannot choose not to because they don't know what it is, so this is unactionable and useless advice.
AI companies should wait to proliferate AI systems that have a substantial chance of being conscious until they have more information about whether they are or not.
Same thing as above, and also the prevailing view here is that it is much more important that AI will kill us, and if we're theoretically spending (social) capital to make these people care about things, the not killing us is astronomically more important.
AI researchers should continue to build connections with philosophers and cognitive scientists to better understand the nature of consciousness
I don't think you've made strong enough arguments to support this claim given the opportunity costs. I don't have an opinion on whether or not you are right here.
Philosophers and cognitive scientists who study consciousness should make more of their work accessible to the public
Same thing as above.
Nitpick: there's something weird going on with your formatting because some of your recommendations show up on the table of contents and I don't think that's intended.
Thanks so much for writing this, quite useful to see your perspective!
First, I don't think that you've added anything new to the conversation. Second, I don't think what you have mentioned even provides a useful summary of the current state of the conversation: it is neither comprehensive, nor the strongest version of various arguments already made.
I've seen this in the public a very surprising amount. For example see the New York Times article linked. Agree it's not remotely popular on LessWrong.
Fair enough. I'm not very sympathetic to panpsychism, but it probably could have been worth mentioning. Though I am not really sure how much it would add for most readers.
That's true; and it might be a moral concern without consciousness. But on many moral accounts, consciousness is highly relevant. I think probably most people would say it is relevant.
Meanwhile, I feel like there is a lot of lower hanging fruit in neuroscience that would also help solve this problem more easily later in addition to actually being useful now.
Curious what research you think would do here?
I agree with this. But at the same time the public conversation keeps talking about consciousness. I wanted to address it for that reason, and really address it, rather than just brush it aside. I don't really think it's true that discussion of this detracts from x-risk; both point in the direction of being substantially more careful, for example.
Good point. I think I had meant to say that researchers should not try to do this. I will edit the post to say that.
I think my recommendations are probably not well targeted enough; I didn't really specify to whom I was recommending them to. I'll try to avoid doing that in the future.
It is not clear to me that consciousness automatically confers moral standing. Consider the well-known conundrum of The Pig That Wants To Be Eaten. By hypothesis it is conscious, and wants to become a tasty meal. ("Don't worry, I'll shoot myself quite humanely.") Is it wrong to use it for that? Wrong to have created it? Would you rather eat a pig that didn't want to be eaten?
If we make a conscious machine whose fulfilment lies only in serving us, which does not care how it is treated, and which has no objection to being turned off and discarded when it is obsolete, what moral standing would it have?
I'm not familiar with the canonical details of house elves in Harry Potter (whether Rowling's or Eliezer's version), but if they were magically created to be a race of servants who desire no other life, was it bad to have created them? Is it bad to give them the life of servitude they desire?
And to come back to current reality, the last question might be asked of some actual BDSM relationships.
I agree with this. If we are able to design consciousness such that a system is fulfilled by serving humans, then it's possible that would be morally alright. I don't think there is a strong enough consensus that I'd feel comfortable locking it in, but to me it seems ok.
By default though, I think we won't be designing consciousness intentionally, and it will just emerge, and I don't think that's too likely to lead to this sort of situation.
Note that "consciousness" and "deception" and "misaligned inner alignment" are probably all entries from the same category.
While obviously deception and power seeking behavior are more destructive problems, so is consciousness for the reasons you mention. Thus, you want a mechanism of training your AI systems to keep them sparse/maximally fast (see speed prior). You want to not arrive at a model that can waste time being conscious, you want all of the cognitive elements the model uses to be solely dedicated to improving performance at the goals humans have assigned it.
Consciousness therefore only happens if it improves performance at the task we have assigned. And some tasks like interacting directly with humans it might improve performance.
The toy example you gave - an AI contract maker 'suffers' when given negative feedback - this one is simpler to solve. Like all software systems, the authors of the system failed on the most common failure point: state management. (state as in mutable entries in memory)
Why does the contract system have a local memory that can even store a mind state so that it can suffer.
This is how we know chatGPT isn't conscious - all of it's I/O is visible to us as part of the text stream we can see. It has no place to store any subjective experiences.
Any subjective experience requires some area in memory so that the system can have such state. And if we manually pattern what is stored there - not allowing any 'scratch space' where it can store variables to track it's deception plans, progress towards it's secret inner alignment, or current metacognition state - it probably can never develop consciousness.
I don't think this is necessarily true. Consciousness could be a side effect of other processes that do improve performance.
The way I've heard this put: a polar bear has thick hair so that it doesn't get too cold, and this is good for its evolutionary fitness. The fact that the hair is extremely heavy is simply a side effect of this. Consciousness could possibly me similar.
I checked and what I am proposing is called a "Markov Blanket".
It makes consciousness and all the other failures of the same category unlikely. Not impossible, but it may in practice make them unlikely enough they will never happen.
It's simple: we as humans determine exactly what the system stores in between ticks. As all the storage bits will be for things the machine must know in order to do it's role, and there are no extra bits, consciousness and deception are unlikely.
Consciousness is a subjective experience, meaning you must have memory to actually reflect on the whole I think therefore I am. If you have no internal memory, you can't have an internal narrative. All your bits are dedicated to tracking which shelf in the warehouse you are trying to reach and the parameters of the item you are carrying, as an example.
It makes deception also difficult, maybe impossible. At a minimum, to have a plan to deceive, it means that when you get input "A", when it's not time to do your evil plans, you do approved action X. You have a bit "TrueColors" that gets set when some conditions are met "Everything is in place", and when the bit is set, on input "A", you are going to do evil action Y.
Deception of all types is this: you're doing something else on the same input.
Obviously, if there are no spaces in memory to know when to do a bad thing, when you get A you have no choice but to do the good thing. Even stochastically deciding to do bad will probably get caught in training.
As all the storage bits will be for things the machine must know in order to do it’s role, and there are no extra bits, consciousness and deception are unlikely.
As all the storage bits will be for things the machine must know in order to do it’s role, and there are no extra bits, consciousness and deception are unlikely.
Wouldn't you need to be as smart as the machine to determine that?
No. The assumption here is that to feel anything or know anything or reflect on yourself, this kind of meta cognition needs writable memory. It's the Markov blanket.
It would not matter how smart the machine is if it's simply missing a capability.
OK....so a pure functional system is safe? But pure functionality is a layer on top of being able to write to memory.
Composite systems vary in danger.
This is why I keep saying "we as humans determine exactly what the system stores in between ticks."
In the case of chatGPT, what it stores is what we see. The body of text from both our inputs and it's outputs. That is the models' input, and each output appends 1 token to it and removes the oldest token.
Larger complex AI systems we would need to carefully decide what it gets to store. We also might expect that other systems trained to do the same task would be able to "pick up" and continue a task based on what a different model created, on the very next frame. This is a testable property. You can run system A for 1000 frames, then randomly switch control to system B for 500 frames, then back to A and so on.
Your performance scores on the task should be similar to the scores achieved by A and B.
If A or B fails suddenly to continue the task, this means there was "improperly encoded information" either model needed to continue encoded in the bits.
Have you read Mark Solms Hidden Spring?