Key questions about artificial sentience: an opinionated guide

So, why can't I go give that talk to DeepMind right now?

A third (disconcerting) possibility is that the list of demands amounts to saying “don’t ever build AGIs”, because the global workspace / self-awareness / whatever is really the only practical way to build AGI. (I happen to put a lot of weight on that possibility, but it’s controversial and non-obvious.) If that possibility is true, then, well, I guess in principle DeepMind could still follow that list of demands, but it amounts to them giving up on their corporate mission, and even if they did, it would be very difficult to get every other actor to do the same thing forever.

If you do reject one or more of these assumptions, I would be curious to hear which ones, and why—and, in light of your different assumptions, how you think we should formulate the major question(s) about AI sentience, and about the relationship between sentience and moral patienthood.

(Warning: haven’t read or thought very much about this.) I guess I’m currently (weakly) leaning towards strong illusionism. But I think I can still care about things-computationally-similar-to-humans. I don’t know, at the end of the day, I care about what I care about. See last section here, and more in this comment.

More precisely, I’m hopeful (and hoping!) that one can soften the “we are rejecting strong illusionism” claim in #3 without everything else falling apart.

[-]Robbo4y20

A third (disconcerting) possibility is that the list of demands amounts to saying “don’t ever build AGIs”

That would indeed be disconcerting. I would hope that, in this world, it's possible and profitable to have AGIs that are sentient, but which don't suffer in quite the same way / as badly as humans and animals do. It would be nice - but is by no means guaranteed - if the really bad mental states we can get are in a kinda arbitrary and non-natural point in mind-space. This is all very hard to think about though, and I'm not sure what I think.

I’m hopeful (and hoping!) that one can soften the “we are rejecting strong illusionism” claim in #3 without everything else falling apart.

I hope so too. I was more optimistic about that until I read Kammerer's paper, then I found myself getting worried. I need to understand that paper more deeply and figure out what I think. Fortunately, I think one thing that Kammerer worries about is that, on illusionism (or even just good old fashioned materialism), "moral patienthood" will have vague boundaries. I'm not as worried about that, and I'm guessing you aren't either. So maybe if we're fine with fuzzy boundaries around moral patienthood, things aren't so bad.

But I think there's other more worrying stuff in that paper - I should write up a summary some time soon!

[-]MichaelStJules4y*40

The training and behavior of these two systems would be identical, in spite of the shift in the value of the rewards. Does simply shifting the numerical value of the reward to “positive” correspond to a deeper shift towards positive valence? It seems strange that simply switching the sign of a scalar value could be affecting valence in this way. Imagine shifting the reward signal for agents with more complex avoidance behavior and verbal reports. Lenhart Schubert (quoted in Tomasik (2014), from whom I take this point) remarks: “If the shift…causes no behavioural change, then the robot (analogously, a person) would still behave as if suffering, yelling for help, etc., when injured or otherwise in trouble, so it seems that the pain would not have been banished after all!”
So valence seems to depend on something more complex than the mere numerical value of the reward signal. For example, perhaps it depends on prediction error in certain ways. Or perhaps the balance of pain and pleasure depends on efficient coding schemes which minimize the cost of reward signals / pain and pleasure themselves: this is the thought behind Yew‑Kwang Ng’s work on wild animal welfare, and Shlegeris's brief remarks inspired by this work.

I think attention probably plays an important role in valence. States with high intensity valence (of either sign, or at least negative; I'm less sure how pleasure works) tend to take immediate priority, in terms of attention and, (at least partly) consequently, behaviour. The Welfare Footprint Project (funded by Open Phil for animal welfare) defines pain intensity based on attention and priority, with more intense pains harder to ignore and pain intensity clusters for annoying, hurtful, disabling and excruciating. If you were to shift rewards and change nothing else, how much attention and priority is given to various things would have to change, and so the behaviour would, too. One example I like to give is that if we shifted valence into the positive range without adjusting attention or anything else, animals would continue to eat while being attacked, because the high positive valence from eating would get greater priority than the easily ignorable low positive or neutral valence from being attacked.

The costs of reward signals or pain and pleasure themselves help explain why it's evolutionarily adaptive to have positive and negative values with valence roughly balanced around low neural activity neutral states, but I don't see why it would follow that it's a necessary feature of valence that it should be balanced around neutral (which would surely be context-specific).

I'm less sure either way about prediction error. I guess it would have to be somewhat low-level or non-reflective (my understanding is that it usually is, although I'm barely familiar with these approaches), since surely we can accurately predict something and its valence in our reportable awareness, and still find it unpleasant or pleasant.

[-]MichaelStJules4y10

Also, there's a natural neutral/zero point for rewards separating pleasure and suffering in animals: subnetwork inactivity. Not just unconsciousness, but some experiences feel like they have no or close to no valence, and this is probably reflected in lower activity in valence-generating subnetworks. If this doesn't hold for some entity, we should be skeptical that they experience pleasure or suffering at all. (They could experience only one, with the neutral point strictly to one side, the low valence intensity side.)

Plus, my guess is that pleasure and suffering are generated in not totally overlapping structures of animal brains, so shifting rewards more negative, say, would mean more activity in suffering structures and less in the pleasure structures.

Still, one thing I wonder is whether preferences without natural neutral points can still matter. Someone can have preferences between two precise states of affairs (or features of them), but not believe either is good or bad in absolute terms. They could even have something like moods that are ranked, but no neutral mood. Such values could still potentially be aggregated for comparisons, but you'd probably need to use some kind of preference-affecting view if you don't want to make arbitrary assumptions about where the neutral points should be.

[-]Robbo4y10

Thanks for this great comment! Will reply to the substantive stuff later, but first - I hadn't heard of the The Welfare Footprint Project! Super interesting and relevant, thanks for bringing to my attention

[-]Signer4y30

As usual with intersection of consciousness and science, I think this needs more clarifications about assumptions. In particular, does "Realism about phenomenal consciousness" imply that consciousness is somehow fundamentally different from other forms of organization of matter? If not, I would prefer for it to be explicitly said that we are talking about merely persuasive arguments about reasons to value computational processes interesting in some way. And for every "theory" to be replaced with "ideology", and every "is" question with "do we want to define consciousness in such a way, that". And without justification for intermediate steps assumptions 1-3 can be simplified into "if a system satisfies my arbitrary criteria then it is a moral patient".

[-]TAG4y20

does “Realism about phenomenal consciousness” imply that consciousness is somehow fundamentally different from other forms of organization of matter?

I don't see why it would, since realism about shoes , and ships and sealing wax doesn't.

[-]Robbo4y10

I'm trying to get a better idea of your position. Suppose that, as TAG also replied, "realism about phenomenal consciousness" does not imply that consciousness is somehow fundamentally different from other forms of organization of matter. Suppose I'm a physicalist and a functionalist, so I think phenomenal consciousness just is a certain organization of matter. Do we still then need to replace "theory" with "ideology" etc?

[-]Signer4y10

It's basically what is in that paper by Kammerer: "theory of the difference between reporatable and unreportable perceptions" is ok, but calling it "consciousness" and then concluding from reasonable-sounding assumption "consciousness agents are moral patients" that generalizing theory about presense of some computational process in humans to universlal ethics is arbitrariness-free inference - that I don't like. Because reasonableness of "consciousness agents are moral patients" decrease than you substitute theory's content into it. It's like theory of beauty, when "precise computational theory that specifies what it takes for a biological or artificial system to have various kinds of conscious, valenced experiences" feels like more implied objectivity.

[-]Robbo4y10

Great, thanks for the explanation. Just curious to hear your framework, no need to reply:

-If you do have some notion of moral patienthood, what properties do you think are important for moral patienthood? Do you think we face uncertainty about whether animals or AIs have these properties? -If you don't, are there questions in the vicinity of "which systems are moral patients" that you do recognize as meaningful?

[-]Signer4y10

-If you do have some notion of moral patienthood, what properties do you think are important for moral patienthood?

I don't know. If I need to decide, I would probably use some "similarity to human mind" metrics. Maybe I would think about complexity of thoughts in the language of human concepts or something. And I probably could be persuaded in the importance of many other things. Also I can't really stop on just determining who is moral patient - I start thinking what exactly to value about them and that is complicated by me being (currently interested in counterarguments against being) indifferent to suffering and only counting good things.

Do you think we face uncertainty about whether animals or AIs have these properties?

Yes for "similarity to human mind" - we don't have precise enough knowledge about AI's or animals mind. But now it sounds like I've only chosen these properties to not be certain. In the end I think moral uncertainty plays more important role than factual uncertainty here - we already can be certain that very high-level low-resolution models of human consciousness generalize to anything from animals to couple lines of python.

[-]superads914y20

I highly doubt this on an intuitive level. If a draw a picture of a man being shot, is it suffering? Naturally not, since those are just ink pigments in a sheet of cellulose. Suffering seems to need a lot of complexity and also seems deeply connected to biological systems. AI/computers are just a "picture" of these biological systems. A pocket calculator appears to do something similar to the brain but in reality it's much less complex and much different, and it's doing something completely different. In reality it's just an electric circuit. Are lightbulbs moral patients?

Now, we could someday crack consciousness in electronic systems, but I think it would be winning the lottery to get there not on purpose.

[-]Robbo4y10

A few questions:

Can you elaborate on this?

Suffering seems to need a lot of complexity

and also seems deeply connected to biological systems.

I think I agree. Of course, all of the suffering that we know about so far is instantiated in biological systems. Depends on what you mean by "deeply connected." Do you mean that you think that the biological substrate is necessary? i.e. you have a biological theory of consciousness?

AI/computers are just a "picture" of these biological systems.

What does this mean?

Now, we could someday crack consciousness in electronic systems, but I think it would be winning the lottery to get there not on purpose.

Can you elaborate? Are you saying that, unless we deliberately try to build in some complex stuff that is necessary for suffering, AI systems won't 'naturally' have the capacity for suffering? (i.e. you've ruled out the possibility that Steven Byrnes raised in his comment)

[-]superads914y10

1.) Suffering seems to need a lot of complexity, because it demands consciousness, which is the most complex thing that we know of.

2.) I personally suspect that the biological substrate is necessary (of course that I can't be sure.) For reasons, like I mentioned, like sleep and death. I can't imagine a computer that doesn't sleep and can operate for trillions of years as being conscious, at least in any way that resembles an animal. It may be superintelligent but not conscious. Again, just my suspicion.

3.) I think it's obvious - it means that we are trying to recreate something that biological systems do (arithmetics, imagine recognition, playing games, etc) on these electronic systems called computers or AI. Just like we try to recreate a murder scene with pencils and paper. But the murder drawing isn't remotely a murder, it's only a basic representation of a person's idea of a murder.

4.) Correct. I'm not completely excluding that possibility, but like I said, it would be a great luck to get there not on purpose. Maybe not "winning the lottery" luck as I've mentioned, but maybe 1 to 5% probability.

We must understand that suffering takes consciousness, and consciousness takes a nervous system. Animals without one aren't conscious. The nature of computers is so drastically different from that of a biological nervous system (and, at least until now, much less complex) that I think it would be quite unlikely that we eventually unwillingly generate this very complex and unique and unknown property of biological systems that we call consciousness. I think it would be a great coincidence.

[-]Charbel-Raphaël4y10

Either consciousness is a mechanism that has been recruited by evolution for one of its abilities to efficiently integrate information, or consciousness is a type of epiphenomenon that serves no purpose.

Personally I think that consciousness, whatever it is, serves a purpose, and has an importance for the systems that try to sort out the anecdotal information from the information that deserves more extensive consideration. It is possible that this is the only way to effectively process information, and therefore that in trying to program an agi, one naturally comes across it

[-]superads914y10

Consciousness definitely serves a purpose, from an evolutionary perspective. It's definitely an adaptation to the environment, by offering a great advantage, a great leap, in information processing.

But from there to say that it is the only way to process information goes a long way. I mean, once again, just think of the pocket calculator. Is it conscious? I'm quite sure that it isn't.

I think that consciousness is a very biological thing. The thing that makes me doubt the most about consciousness in non-biological systems (let alone in the current ones which are still very simple) is that they don't need to sleep and they can function indefinitely. Consciousness seems to have these limits. Can you imagine not ever sleeping? Not ever dying? I don't think such would be possible for any conscious being, at least one remotely similar to us.

[-]Robbo4y10

to say that [consciousness] is the only way to process information

I don't think anyone was claiming that. My post certainly doesn't. If one thought consciousness were the only way to process information, wouldn't there not even be an open question about which (if any) information-processing systems can be conscious?

[-]superads914y10

I never said you claimed such either, but Charbel did.

"It is possible that this [consciousness] is the only way to effectively process information"

I was replying to his reply to my comment, hence I mentioned it.

[-]Jacob Pfau4y20

How exactly does reward relate to valenced states in humans? In general, what gives rise to pleasure and pain, in addition to (or instead of) the processing of reward signals?

These problems seem important and tractable even if working out the full computational theory of valence might not be. We can distinguish three questions:

What is the high-level functional role of valence? (coarse-grained functionalism)
What evolutionary pressures incentivized valenced experience?
What computational processes constitute valence? (fine-grained functionalism)

Answering #3 would be best, but it seems to me that answering #1 and #2 is far more feasible. A promising and realistic scenario might be discovering a distinction between positive and negative valence from perspectives #1 and 2, and then giving the DeepMind presentation encouraging them to avoid the coarse-grained functional structures and incentives for negative valence. From my incomplete understanding of the consciousness and valence literature, it seems to me that almost all work is contributing to answering question #1 not question #3.

One avenue in this direction might be looking into the interaction between valence and attention. It seems to me that there is an asymmetry there (or at least a canonical way of fixing a zero point). Positive valence involves attention concentration whereas negative valence involves diffusion of attention / searching for ways to end this experience. A couple reasons why I'm optimistic about this direction: First, attention likely bears some intrinsic connection with consciousness (other coarse-grained functional correlates such as commensurability, addiction etc. need not); second, attention manipulation seems like it might be formalizable in a way relevant for machine learning practitioners. (I'm using attention here in the philosophy/neuro sense not the transformer sense)

[-]Robbo4y10

Very interesting! Thanks for your reply, and I like your distinction between questions:

Positive valence involves attention concentration whereas negative valence involves diffusion of attention / searching for ways to end this experience.

Can you elaborate on this? What is do attention concentration v. diffusion mean? Pain seems to draw attention to itself (and to motivate action to alleviate it). On my normal understanding of "concentration", pain involves concentration. But I think I'm just unfamiliar with how you / 'the literature' use these terms.

[-]Jacob Pfau4y*20

The relationship between valence and attention is not clear to me, and I don't know of a literature which tackles this (though imperativist analyses of valence are related). Here are some scattered thoughts and questions which make me think there's something important here to be clarified:

There's a difference between a conscious stimulus having high saliency/intensity and being intrinsically attention focusing. A bright light suddenly strobing in front of you is high saliency, but you can imagine choosing to attend or not to attend to it. It seems to me plausible that negative valence is like this bright light.
High valence states in meditation are achieved via concentration of attention
Positive valence doesn't seem to entail wanting more of that experience (c.f. there existing non-addictive highs etc.), whereas negative valence does seem to always entail wanting less.

That is all speculative, but I'm more confident that positive and negative valence don't play the same role on the high-level functional level. It seems to me that this is strong (but not conclusive) evidence that they are also not symmetric at the fine-grained level.

I'd guess a first step towards clarifying all this would be to talk to some researchers on attention.

[-]interstice4y20

I think I have a pretty good theory of conscious experience, focused on the meta-problem -- explaining why it is that we think consciousness is mysterious. Basically I think the sense of mysteriousness results from our brain considering 'redness'(/etc) to be a primitive data type, which cannot be defined in terms of any other data. I'm not totally sure yet how to extend the theory to cover valence, but I think a promising way forward might be trying to reverse-engineer how our brain detects empathy/other-mind-detection at an algorithmic level, then extend that to cover a wider class of systems.

[-]TAG4y20

Could you respond to this comment?

[-]interstice4y20

Responded!

[-]Robbo4y20

Thanks, I'll check it out! I agree that the meta-problem is a super promising way forward

[-]Josh Gellers4y10

Thanks for this thorough post. What you have described is known as the “properties-based” approach to moral status. In addition to sentience, others have argued that it’s intelligence, rationality, consciousness, and other traits that need to be present in order for an entity to be worthy of moral concern. But as I have argued in my 2020 book, Rights for Robots: Artificial Intelligence, Animal and Environmental Law (Routledge), this is a Sisyphean task. Philosophers don’t (and may never) agree about which of these properties is necessary. We need a different approach altogether in order to figure out what obligations we might have towards non-humans like AI. Scholars like David Gunkel, Mark Coeckelbergh, and myself have advocated for a relations-based approach, which we argue is more informed by how humans and others interact with and relate to each other. We maintain that this is a more realistic, accurate, and less controversial way of assessing moral status.

[-]Kredan4y10

higher levels of complexity should increase our credence somewhat that consciousness-related computations are being performed

To nuance a bit: while some increasing amount of complexity might be necessary for more consciousness a system with high complexity does not necessarily implies more consciousness. So not clear how we should update our credence that C is being computed because some system has higher complexity. This likely depends on the detail of the cognitive architecture of the system.

It might also be that systems that are very high in complexity have a harder time being conscious (because consciousness requires some integration/coordination within the system) so there might be a sweet spot for complexity to instantiate consciousness.

For example consciousness could be roughly modeled as some virtual reality that our brain computes. This "virtual world" is a sparse model of the complex world we live in. While the model is generated by complex neural network, it is a sparse representation of a complex signal. For example the information flow of a macroscopic system in the network that is relevant to detect consciousness might actually not be that complex although it "emerges" on top of a complex architecture.

The point is that consciousness is perhaps closer to some combination of complexity and sparsity rather than complexity alone.

[-]Shmi4y10

The whole field seems like an extreme case of anthropomorphizing to me. The "valence" thing in humans is an artifact of evolution, where most of the brain is not available to introspection because we used to be lizards and amoebas. That's not at all how the AI systems work, as far as I know.

[-]Jacob Pfau4y40

Valence is of course a result of evolution. If we can identify precisely what evolutionary pressures incentivize valence, we can take an outside (non-anthropomorphizing, non-xenomorphizing) view: applying Laplace's rule gives us a 2/3 chance that AI developed with similar incentives will also experience valence?

[-]Robbo4y20

The whole field seems like an extreme case of anthropomorphizing to me.

Which field? Some of these fields and findings are explicitly about humans; I take it you mean the field of AI sentience, such as it is?

Of course, we can't assume that what holds for us holds for animals and AIs, and have to be wary of anthropomorphizing. That issue also comes up in studying, e.g., animal sentience and animal behavior. But what were you thinking is anthropomorphizing exactly? To be clear, I think we have to think carefully about what will and will not carry over from what we know about humans and animals.

The "valence" thing in humans is an artifact of evolution

I agree. Are you thinking that this means that valenced experiences couldn't happen in AI systems? Are unlikely to? Would be curious to hear why.

where most of the brain is not available to introspection because we used to be lizards and amoebas

I also agree.with that. What was the upshot of this supposed to be?

That's not at all how the AI systems work

What's not how the AI systems work? (I'm guessing this will be covered by my other questions)

[-]TAG4y10

The “valence” thing in humans is an artifact of evolution, where most of the brain is not available to introspection

Is it? That's a very , err, compressed argument.


The San Junipero servers from season 3, episode 4 of Black Mirror


'International Klein Blue (IKB Godet) 1959' by Yves Klein (1928-62). Which people claim to appreciate looking at.


Illustration from the canonical paper on global workspace theory


I like how content this fellow from Graziano’s paper is


Advertisement for Wolcott’s Instant Pain Annihilator (c. 1860)

There’s limited info on what “expert” consensus on this issue is. The Association for the Scientific Study of Consciousness surveyed its members. For the question, "At present or in the future, could machines (e.g., robots) have consciousness?" 20.43% said 'definitely yes', 46.09% said ‘probably yes’. Of 227 philosophers of mind surveyed in the 2020 PhilPapers survey, 0.88% "accept or lean towards" some current AI systems being conscious. 50.22% "accept or lean towards" some future AI systems being conscious. ↩︎
As discussed in "Questions about valence" below, the scale of suffering would depends not just on the number of systems, but the amount and intensity of suffering vs. pleasure in these systems. ↩︎
see note on terminology below ↩︎
Sometimes sentientism refers to the view that sentience is not just sufficient for moral patienthood, but necessary as well. For these purposes, we only need the sufficiency claim. ↩︎
The way I've phrased this implies that a given experience just is the computational state. But this can be weakened. In fact, computational functionalism is compatible with a variety of metaphysical views about consciousness—e.g., a non-physicalist could hold that the computational state is a correlate of consciousness. For example, David Chalmers (2010) is a computational functionalist and a non-phsyicalist: "the question of whether the physical correlates of consciousness are biological or functional is largely orthogonal to the question of whether consciousness is identical to or distinct from its physical correlates." ↩︎
At least, there’s pro tanto reason to work on it. It could be that other problems like AI alignment are more pressing or more tractable, and/or that work on the Big Question is best left for later. This question has been discussed elsewhere. ↩︎
Unless your view is that phenomenal consciousness does not exist. If that’s your view, then the pretty hard problem, as phrased, is answered with “none of them”. See assumption #3, above. See Chalmers (2018) pp 8-9, and footnote 3, for a list of illusionists theories. ↩︎
LeDoux, Michel, and Lau (2020) reviews how puzzles about amnesia, split brain, and blindsight were crucial in launching consciousness science as we know it today ↩︎
What about predictive processing? Predictive processing is (in my opinion) not a theory of consciousness per se. Rather, it’s a general framework for explaining prediction and cognition whose adherents often claim that it will shed light on the problem of consciousness. But such a solution is still forthcoming. ↩︎
See Appendix B of Muehlhauser’s report on consciousness and moral patienthood, where he argues that our theories are woefully imprecise. ↩︎
Some people think that conscious experiences in general, not just valenced states of consciousness or sentience, are valuable. I disagree. See Lee (2018) for an argument against the intrinsic value of consciousness in general. ↩︎
the ventral tegmental area and the pars compacta of the substantia nigra ↩︎
Paraphrased from discussion with colleague Patrick Butlin, some other possible connections between consciousness and valence: (a) Valence just is consciousness plus evaluative content. On this view, figuring out the evaluative content component will be easier than the consciousness component, but won’t get us very far towards the Big Question (b) Compatibly, perhaps the functional role of some specific type of characteristically valenced state e.g. conscious sensory pleasure is easier to discern than the role of consciousness itself, and can be done first (c) Against this kind of view, some people will object that you can't know that you're getting at conscious pleasure (or whatever) until you understand consciousness. (d) If valence isn't just consciousness plus evaluative content, then I think we can make quite substantive progress by working out what it is instead. But presumably consciousness would still be a component, so a full theory couldn't be more tractable that a theory of consciousness. ↩︎
A question which I have left for another day: does it make sense to claim that a system is "a little bit" sentient or conscious? Can there be borderline cases of consciousness? Does consciousness come in degrees. See Lee (forthcoming) for a nice disambiguation of these questions. ↩︎
Peter Godfrey-Smith is a good example of someone who has been explicit about background conditions (in his biological theory of consciousness, metabolism is a background condition for consciousness). DeepMind's Murray Shanahan talks about embodiment and agency but, in my opinion, not precisely enough. ↩︎
For discussion and feedback, thanks Fin Moorhouse, Patrick Butlin, Arden Koehler, Luisa Rodriguez, Bridget Williams, Adam Bales, and Justis Mills and the LW feedback team. ↩︎

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

51

Key questions about artificial sentience: an opinionated guide

51

51

Introduction

The Big Question

Why not answer a different question?

Why not answer a smaller question?

Subquestions for the Big Question

A note on terminology

Questions about scientific theories of consciousness

Further reading

Questions about valence

How do valenced states relate to each other?

Further reading

What's the connection between reward and valence?

Further reading

The scale and structure of valence

Further reading

Applying our theories to specific AI systems

Further reading

Conclusion