LESSWRONG
LW

AI
Frontpage

11

[ Question ]

How could I tell someone that consciousness is not the primary concern of AI Safety?

by Lysandre Terrisse
13th Jun 2025
3 min read
A
1
2

11

AI
Frontpage

11

How could I tell someone that consciousness is not the primary concern of AI Safety?
2JBlack
3mattmacdermott
New Answer
New Comment

1 Answers sorted by
top scoring

JBlack

Jun 14, 2025

20

Most of the book was written in 2020 or earlier, which makes it ancient in terms of technical advances and social recognition of AI concerns. I would say that the paragraph is correct as of the date of writing, where it talks about the non-technical articles generally circulated in the media at the time.

For example, not even GPT-2 is mentioned until page 422 of the book, possibly written later than these background chapters. The "success stories" for deep learning on the previous page refer mostly to Siri, Alexa, progress in ImageNet benchmarks, and AlphaGo. They refer to self-driving cars with full autonomy as being "not yet within reach". Anything written in the present tense should be interpreted as being back when GPT-2 was new.

Their statements are less true now, and it is possible that the authors would no longer endorse those paragraphs as being true of the current day if brought back to their attention. By now I would expect them to be aware of the recent AI safety literature including technical publications assessing safety of current AI systems in ways that present counterexamples to multiple statements in the second paragraph without any reference to sentience.

Add Comment
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 7:12 PM
[-]mattmacdermott1mo30

I think that quite often when people say ‘consciousness’ in contexts like this, and especially when they say ‘sentience’, they mean something more like self-awareness than phenomenal consciousness.

Probably they are also not tracking the distinction very carefully, or thinking very deeply about any of this. But still, thinking the problem is ‘will AIs become self-aware?’ is not quite as silly as thinking it is ‘will the AIs develop phenomenal consciousness?’ and I think it’s the former that causes them to say these things.

Reply
Moderation Log
Curated and popular this week
A
1
1

In the PDF version of the Dive into Deep Learning book, at page 27, we can read this:

Frequently, questions about a coming AI apocalypse and the plausibility of a singularity have been raised in non-technical articles. The fear is that somehow machine learning systems will become sentient and make decisions, independently of their programmers, that directly impact the lives of humans. To some extent, AI already affects the livelihood of humans in direct ways: creditworthiness is assessed automatically, autopilots mostly navigate vehicles, decisions about whether to grant bail use statistical data as input. More frivolously, we can ask Alexa to switch on the coffee machine.

Fortunately, we are far from a sentient AI system that could deliberately manipulate its human creators. First, AI systems are engineered, trained, and deployed in a specific, goal-oriented manner. While their behavior might give the illusion of general intelligence, it is a combination of rules, heuristics and statistical models that underlie the design. Second, at present, there are simply no tools for artificial general intelligence that are able to improve themselves, reason about themselves, and that are able to modify, extend, and improve their own architecture while trying to solve general tasks.

A much more pressing concern is how AI is being used in our daily lives. It is likely that many routine tasks, currently fulfilled by humans, can and will be automated. Farm robots will likely reduce the costs for organic farmers but they will also automate harvesting operations. This phase of the industrial revolution may have profound consequences for large swaths of society, since menial jobs provide much employment in many countries. Furthermore, statistical models, when applied without care, can lead to racial, gender, or age bias and raise reasonable concerns about procedural fairness if automated to drive consequential decisions. It is important to ensure that these algorithms are used with care. With what we know today, this strikes us as a much more pressing concern than the potential of malevolent superintelligence for destroying humanity.

If you have been interested about the alignment problem and AI safety, you should probably already know that the second sentence of the first paragraph is wrong. Indeed, both this sentence and the first sentence of the second paragraph are mentioning sentience as the primary concern of AI safety. However, the opinion of the field is that sentience is not the primary concern of AI safety. As professor Stuart Russell famously said:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

1.     The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.

2.     Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable.  This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want. A highly capable decision maker – especially one connected through the Internet to all the world’s information and billions of screens and most of our infrastructure – can have an irreversible impact on humanity.

This mistake is concerning because this book is supposed to be the reference book of Deep Learning. I want to send an email about this to the authors (although I would appreciate it if an expert could do it instead). However, when writing the email, I realized that I do not have enough evidence that AI safety researchers agree that consciousness is not the primary concern of AI safety.

I have, of course, read many AI safety researchers express this opinion, including Stuart Russell, but I never wrote down the articles where I read these opinions. Does anyone have a list of articles/quotes/surveys which could convince any researcher that this is indeed the opinion of the field?

(It is pretty hard to convince someone that X is the opinion of the field if they believe the opposite)