Maximal Sentience: A Sentience Spectrum and Test Foundation — LessWrong

x

Maximal Sentience: A Sentience Spectrum and Test Foundation — LessWrong

Hello! I am writing this post, because I think it is important to get to a definition of sentience good enough to where the level of sentience of a being becomes testable and we can make predictions about how a being with some level of sentience would behave.

Part of why I find this to be of importance is because it might not be unthinkable that we're going to create AIs which can satisfy some definitions of sentience in the foreseeable future and tests for sentience developed after that point would likely be constructed in such a way as to favour humans and exclude the AI, whereas with any test constructed beforehand we can say "Yup, this does indeed pass this test we made some time ago, so maybe we should consider treating it as a lifeform with value". Also, the way of thinking about sentience outlined in this post just seems to work pretty well in general, even when just thinking about humans or other living beings.

I recognize that many people don't have any particular definition of sentience past "The thing which makes humans human" and even if they do have a more specific definition, it is not usually the same one as any other human being might have. Therefore, I am simply making one up which can be criticized and improved on later. I don't care whether this definition I come up with is what is generally considered to be sentience but rather that it measures properties I (and hopefully others) care about associated with sentience in a way you can work with. Whether we want to give this a different term than sentience is something I'll leave open for discussion.

I am going to start with two claims which I believe hold true for any definition I care about or are necessary to make to get away from the philosophy level and more to something useable:

There are observable differences between a less and more sentient being. If there were no differences, then we are moving in a subspace of possible definitions which are meaningless.
It is not possible to simulate sentience. This might be controversial, however I believe that to simulate this, you'd have to still make all the relevant thoughts and realizations. Thus, to act as a sentient being is to be one.

This is an edit, but I think I forgot to bring up an important piece of evidence for why I think it is enough to measure sentience by observance - an actual example of a person demonstrating how being sentient matters and is observable:

When Lex Friedman had Eliezer over on his podcast, there were many instances where Lex seemed to struggle with understanding the points Eliezer was making and did not seem generally convinced, however after that podcast, something magical happened. Lex started raising relevant AI safety concern questions to people who visited his podcast afterwards. He even signed the statement regarding taking AI seriously as an extinction risk. To me, no being which isn't sentient would have changed their outlook and behaviour in this way and thus, it is possible to make conclusions about some of his internal processes and thoughts he might have had even without reverse-engineering his entire brain.

So, let's start by considering a few observations which might indicate lower or higher degrees of sentience.

Example 1: Reconciliation of contradictory information: The being is presented with information A first. It has no prior knowledge which contradicts A and no reason to believe A is false. It is later given information B, which cannot be true if A is also true. There are a number of ways a being could respond to this:

It does not question either information and will consider A and B to be true. This would indicate least/no sentience
It notices the discrepancy and discards information B, since it does not work in conjunction with information A
It notices the discrepancy and starts to ask questions aimed to figure out whether A or B (or neither) are true. If there is no availability of information which would allow a proper conclusion, it will consider both A and B with scepticism in the future.

Example 2: Adaptive behaviour based on experiences: The being ends up being burned by touching a red-hot glowing object. In this case, what matters is less so the ability of learning not to burn yourself on hot objects, but how often it would need to burn itself to adapt its behaviour sufficiently and how well it generalizes. This could range from needing to be burned by the same object 100 times to learn not to touch it to learning that a "hotness" property exists which will burn you if it's too high, figuring out where this threshold lies and avoiding other hot objects.

These examples are here to serve as a rough outline of things which to me indicate lower or higher levels of sentience without necessarily being indicative of pure intelligence (although separating the two concepts fully seems impossible). From these examples I conclude that factors I care about here are how well something is capable of reconciling new information with its existing world view and how easily it can adapt to novel environments and situations.

This leads me to proposed qualities of a "maximally sentient being". In my conceptualization, such an entity would possess the ability to frame new information in the context of all its previous knowledge. When encountering conflicting information, it actively seeks further data and clarification to ascertain the validity of either the new or pre-existing information.

In stark contrast, we humans often compartmentalize new information into specific sectors of our understanding, at times even ignoring or dismissing information that doesn't neatly conform to our existing schemas. While this is part of our survival mechanism, it can lead to internal inconsistencies and cognitive dissonance.

A maximally sentient being would never argue for point A in one situation and then for the conflicting point B in another without being fully aware of the factors which made them choose different actions. It would always have full awareness of everything it has encountered thus far and doesn't do partial updates on new information.

With this in mind, I find even the most sentient of humans to barely even register on a scale ranging from no to maximal sentience. Humans very rarely do anything close to a full update on novel information. When this happens, we generally speak of epiphanies which are very sporadic and unreliable to trigger.

Unfortunately, to test for this concept of sentience is still not easy, since it requires awareness of all the information a being has encountered before, but it seems to come closer to being testable at all than anything else I've seen. Possibly, this can be overcome on the road to developing a proper test.

Finally, I would like to leave with a few observations I have made since thinking about sentience this way - they may or may not be true, but seem likely to me:

The level of sentience something would exhibit appears to be greatly be influenced by the type of learning algorithm used. Notoriously, I think that something which simply learns things without being able to make decisions about how to frame the information or what to discard is rather unlikely to ever reach a meaningful level of sentience. Therefore, regular large-scale shoving of data into language models using backpropagation may not ever lead to sentience. There are bandaid solutions, but fundamentally better approaches might be out there.
Human learning doesn't really seem to result in all that much sentience either, however humans themselves have drastically differing levels of sentience.
More sentient beings intuitively appear to have less severe failure modes.

I would like to encourage readers of this post to offer criticisms and their own insights so that we can improve upon this and actually get to something we can work with.