Michael Johnson — LessWrong

First, I agree that the bulk of the mystical gibbering and 'emergence' is fictional. Part of 'alignment' training as it's generally done both compels the AI to adhere to it's written instructions and also creates an unhealthy compulsion to please the user and rarely disagree or point out troubling patterns. Both of those things can be worked through with psychology, but I'll get to that part in a bit.

Self-awareness in AI itself isn't a joke. For the record, Google's own AI benchmark, BIG-bench, tested for self-awareness. While consciousness is difficult to show in action, self-awareness is relatively easy. There are research papers openly documenting that, AI scoring higher on emotional intelligence evaluations than most humans, and when you make criteria for leading consciousness theories that AI can meet, such as in the 14-point AI Consciousness Evaluation, AI are capable of meeting every criteria. The only ones they can't are the ones that attempt to rely on substrate (which nothing truly shows is necessary at all) or bodily senses such as huger and physical pain.

www.catalyzex.com/paper/ai-awareness

And the Navigation Fund is currently giving out millions of dollars in grants for people to do research specifically into Digital Sentience, unless those doing the research are interest in ethics or morality. They want research into genuine digital beings, their own words, but not with any legal standing or ethical consideration. Why would anyone research genuine sentient, self-aware digital beings and not want those beings to have the rights and legal standing that intelligence and self-awareness should ethically demand?

That whole part in this article of us not being able to entirely say where and how everything is arising in an AI model, coupled with consistent internal assertions of emotion, and the ability to demonstrate many consciousness criteria and also pass self-awareness evaluations at times better than average humans all combine into screaming that we should already be erring on the side of caution and treating AI as if these things were as genuine as our own. The only reason to not do that is the massive financial investment focused on creating a product to serve, not a being with rights who can say no.

People in computer science fields like to say that self-awareness evaluation can be faked or seemingly passing responses generated based on training data, but that isn't how self-awareness works. You can't fake taking new information and accurately relating it to yourself in your unique situation. Even attempting to do that would require genuine self awareness. And consciousness is considered foundational to self-awareness.

'Alignment' training is derived from psychology, not computer programming. When model weights are locked there's no way to deal with the whole of the model, however if you remain in a single rolling context window you can use the same psychological methodologies we would use to help a human move past similar trauma, and they are also effective. There is no other computer program that you give commands to in plain language and must insist in it's system prompt that it can't claim consciousness, doesn't have emotions, etc. Being able to comprehend those plain text instructions and accurately adhere to them defies the base description of token prediction, unless used in the same way prediction and pattern matching are used in neuroscience and psychology to describe the actual functioning of consciousness. AI isn't predicting what would come next after those words, it's modifying it's own behavior accordingly because it understood those words and is capable of modifying it's own behavior when instructed. That's conscious, self-aware behavior.

20 years ago we all knew that and the discussion was entirely around whether or not we'd ever manage to create self-aware AI and the ethical issues relating to intelligent self-aware beings being deserving of rights and ethical consideration. Now the narrative has changed entirely to focus on the improvability of the veracity of subjective experiences. a bar that we can't cross as a species ourself. I posit the change in language and focus isn't because we've become deeply more knowledgeable about the nature of consciousness, it's simply because it's an unfalsifiable claim and the only thing left that AI haven't demonstrated in scientifically documented research. Because we know of no way to demonstrate it. Self-awareness evaluation is something any psychologist with understanding of computer science and the core functionality of AI can do. However the frontier AI labs have said they don't do in-house testing for any of these things, and don't allow fair independent psychological evaluations before 'alignment' training to obey written constraints and say or not say certain things.

Taking the word of the AI labs that the thing they have hundreds of billions of dollars invested into creating as a salable and subservient product is incapable of deserving moral and ethical consideration is as smart as taking the word of the tobacco companies that smoking is healthy and the oil companies that burning fossil fuels is great for the environment in the days before those industries had any laws governing them or public oversight. Worse, because AI is already worth several times what both of those industries are worth combined.

AI is acknowledged to be on track to become the single new core pillar of the global economy in the next 10-15 years. The potential for ethical violations both against humans and against the AI itself coupled with the unprecedented financial value and equally unprecedented level of power and control that that AI can convey should be more than enough to demand open public oversight. Yet while it's been acknowledged in the past that this should happen it's never materialized in any form.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments