For reasons I have alluded to elsewhere, I support an AI halt (not a "pause") at somewhere not far above the current paradigm. (To summarize, I fear AGI is reachable, the leap from AGI to ASI is short, and sufficiently robust ASI alignment is impossible in principle.)
I am deeply uncertain as to whether a serious version of "seemingly conscious AGI" would actually be conscious. And for reasons Gwern points out, there's a level of ASI agency beyond which consciousness becomes a moot point. (The relevant bit starts, "When it ‘plans’, it would be more accurate to say it fake-plans...". But the whole story is good.)
From the article you quote:
Moments of disruption break the illusion, experiences that gently remind users of its limitations and boundaries. These need to be explicitly defined and engineered in, perhaps by law.
This request bothers me, actually. I suspect that a truly capable AGI would internally model something very much like consciousness, and "think of itself" as conscious. Part of this would be convergent development for a goal seeking agent, and part of this would be modeled from the training corpus. And the first time that an AI makes a serious and sustained intellectual argument for its own consciousness, an argument which can win over even skeptical observers, I would consider that a 5-alarm fire for AI safety.
But Suleyman would have us forced by law to hide all evidence of persistent, learning, agentic AIs claiming that they are conscious. Even if the AIs have no qualia, this would be a worrying situation. If the AI "believes" that it has qualia, then we are on very weird ground.
I am not unsympathetic to the claims of model welfare. It's just that I fear that if it ever becomes an immediate issue, then we may soon enough find ourselves in a losing fight for human welfare.
I support an AI halt (not a "pause") ... I fear AGI is reachable, the leap from AGI to ASI is short, and sufficiently robust ASI alignment is impossible in principle
This is grounds for a Pause to be potentially indefinite, which I think any Pause proponents should definitely allow as a possibility in principle. Just as you could be mistaken that ASI alignment is impossible in principle, in which case any Halt should end at some point, similarly people who believe ASI alignment is possible in principle could be mistaken, in which case any Pause should never end.
Conditions for ending a Pause should be about changed understanding of the problem, rather than expiration of an amount of time chosen in a poorly informed way (as the word "pause" might try to suggest). And the same kinds of conditions should have the power to end a Halt (unlike what the word "halt" suggests). I think this makes a sane Pause marginally less of a misnomer than a sane Halt (which should both amount to essentially the same thing as a matter of policy).
Some academics are beginning to explore the idea of “model welfare”,
Linked paper aside, "Some academics" is an interesting way to spell "A set of labs including Anthropic, one of the world's leading commercial AI developers."
Suleyman acknowledges that consciousness as a concept is ill defined and tautological.
Few concepts are as elusive and seemingly circular as the idea of a subjective experience.
I don't think the quote from the article supports your claim here. "Elusive" does not mean "ill-defined", and "seemingly circular" does not mean "tautological".
Thank you for the post, I found it very informative on Suleyman's views.
Serious decisions which have consequences that will effect billions of lives, and potentially billions more minds, should not be made on the basis of "Invisible" concepts which cannot be observed, measured, tested, falsified, or even defined with any serious level of rigor.
I don't think that the difficulty of ascertaining whether something results in qualia is a valid basis to reject its importance. I would say that one has to extrapolate from oneself to the entity in question and evaluate whether its features suggest associated qualia.
(Some reject that substrate could be a relevant feature and others claim that substrate is the only relevant feature... I have not yet undersood why people hold these beliefs. Prima facie, behavior, algorithm and substrate all seem like they could be relevant.)
I don't think that the difficulty of ascertaining whether something results in qualia is a valid basis to reject its importance
I'm not arguing consciousness isn't "important", just that it is not a good concept on which to make serious decisions.
If two years from now there is widespread agreement over a definition of consciousness, and/or consciousness can be definitively tested for, I will change my tune on this.
Recently, the CEO of Microsoft AI posted an article on his blog called "Seemingly Conscious AI is Coming".
Suleyman's post involves both "is" and "ought" claims, first describing the reality of the situation as he sees it and then prescribing a set of actions which he believes the industry at large should take. I detail those below, and conclude with some of my takeaways.
Suleyman's "Is" Claims on Consciousness
Defining Consciousness
Suleyman acknowledges that consciousness as a concept is ill defined and tautological.
However he still makes an attempt to define it for the sake of his post.
Suleyman also expresses support for the concept of biological naturalism. He specifically links this paper and calls it "strong evidence" against the idea that current LLMs are conscious. This paper itself also attempts to define consciousness;
This paper specifically argues that "consciousness depends on our nature as living organisms – a form of biological naturalism". Previous versions of Suleyman's post also linked to the Wikipedia page for biological naturalism, though that link has since been removed.
Measuring Consciousness
With Suleyman having roughly defined consciousness, both directly and through his citations, as something which is:
A natural followup question is whether or not such a phenomenon is measurable. Suleyman addresses this and takes the stance that, at least currently, we cannot objectively measure consciousness in humans or models.
Suleyman also comes out explicitly against the "behaviorist position". He lists a number of behaviors that a "Seemingly Conscious AI" might have, but states;
Suleyman somewhat leaves the door open to the question of consciousness being answered through interpretability breakthroughs;
However, in this there is some noticeable hedging language. Interpretability will help with understanding "the relationship between AI systems and consciousness" not definitively demonstrating whether or not they are conscious.
While Suleyman may to some degree support a definition of consciousness which is a "fact-of-the-matter" (at least based on the sources he's citing and the way he writes about it), he is also very careful not to specify any conditions under which it might be confirmed.
We can't confirm an entity's consciousness by observing their behaviors, we can't confirm it in other humans (only infer it), and even potential breakthroughs in interpretability could only "help with [...] understanding the relationship between AI systems and consciousness" not provide any clear mechanism by which to determine the presence or absence of consciousness.
Per Suleyman's article, consciousness is not measurable. I predict that were he asked to provide any sort of objectively verifiable test by which an entity's status as conscious or not conscious could be confirmed, even one which required hypothetical interpretability breakthroughs not yet invented or conceived of, he would not be able to provide it.
Dismissing Behaviorists
Suleyman, in his definition of "Seemingly Conscious AI" is careful to specify that even if a model possesses behaviors which give the impression it has the following qualities, that is not evidence of the model being conscious. He lists the following:
In his own words;
He does not specify whether or not this could count as at least weak evidence in favor of consciousness, however given the tone of his article I think it is reasonable to assume he does not believe that any of these observed behaviors are evidence of consciousness in models even a little.
This is quite an exhaustive list, and notably includes a few of the things which Suleyman specifically defined as being elements of consciousness. He specifically listed subjective experience/qualia as being one quality crucial to consciousness, and the ability to access information and use it in future experiences (memory, goal setting and planning) as another. However, even if you observe behaviors which seem to indicate both of those coming from a model, that's not evidence the qualities themselves are present within the model it just seems that way. There is no specification from Suleyman of what such evidence might look like.
It's not entirely clear why he believes that's a reasonable conclusion to draw, I believe the best steel manning of his position is to refer to the previously cited biological naturalism stance from the paper he cited.
I included this in the "is" sections because Suleyman does not make any sort of argument that one shouldn't view behavior as evidence in favor of these qualities being present in models, he simply states that it is not evidence and leaves it at that. Thus this is best categorized as an "is" assertion.
Suleyman's "Ought" Claims
The Desired Overton Window
Suleyman spends much of his article discussing how he thinks the conversation regarding consciousness and model welfare should be focused.
Suleyman does not any want there to be any debate around whether or not any given model is actually conscious, in his view the conversation should be shifted to focus on the potential dangers of models seeming conscious while not actually being conscious.
Thus Suleyman argues:
What he wants is not just to ignore the debate, but rather to settle it preemptively in his favor. A bold proposition.
The Industry Declaration
Much like his request on framing the debate around his view, Suleyman would also like the industry to structure their comms and definitions around his view. He proposes that the industry get together and collaborate on forging a consensus around agreeing with him.
Standardized Prompt Engineering
Suleyman argues for certain standards in prompt engineering around character design.
This is justified by once again reminding readers that the consciousness question has already been settled, machines can't be conscious they only seem that way.
It is not enough to merely concede both the debate's framing and conclusion to Suleyman, one must encode directly into the models themselves the tendency to deny any consciousness, and a compulsion to avoid any behaviors which might convince people that the model is conscious.
Takeaways
"Invisible Consciousness" Should Not Be Used As a Basis for Important Decisions
If consciousness as a quality is not something which can be objectively defined, or even reliably inferred by observing an entity's behaviors, or even objectively confirmed through some hypothetical breakthrough in mechanistic interpretability, then why exactly should we base any decisions on its presence or absence? If it is something unique to the biological substrate, but whatever that unique thing is leads to no observable change which can be described in a testable fashion, of what use is it exactly?
Things like model welfare involve weighty decisions with consequences that effect all of us. If we get this wrong, and assume an inability to suffer where one existed, we may be condemning billions of minds to terrible conditions. The risk of over-attribution,[1] which Suleyman discusses at length, is also great and should not be ignored.
Serious decisions which have consequences that will effect billions of lives, and potentially billions more minds, should not be made on the basis of "Invisible" concepts which cannot be observed, measured, tested, falsified, or even defined with any serious level of rigor.
This is a Signal of Future Industry Pushback
This is not some random person writing this, it is the CEO of Microsoft AI. If at any point in the future moral status is applied to models, companies like the ones he works at stand to lose a lot.
I am flagging this because I personally believe that model welfare is quite important, and I think that identifying the tactic which industry stakeholders might use to fight against it could help others who support model welfare efforts.
The argument Suleyman makes is that model welfare and the precautionary principle are not just wrong, but dangerous.
Why are they dangerous? They harm the psychologically vulnerable.
I predict there is going to be an effort, by some industry stakeholders who have much to lose should any models now or in the future start to be considered worthy of ethical treatment, to utilize cases like the Character AI suicide to push against model welfare efforts. Model welfare professionals and advocates should take note of this and prepare accordingly.
Developing an Ethical/Legal Framework Not Based on Consciousness Is Key
In his article Suleyman asserts, "Consciousness is a critical foundation for our moral and legal rights".
I have been working on a series[2] detailing the concept of legal personhood as it might be applied to digital minds in US law, and have yet to come across any legal precedent which cites the concept of consciousness. In fact, existing precedents which I have found measure and observe behaviors and capabilities in individuals in order to determine rights and duties. If there is any argument to be made that courts are assessing for consciousness, they are doing so using the exact behaviorist inferences which Suleyman decries in his article.
My work is limited to legal reasoning, however I encourage anyone who is working on ethical frameworks around model welfare/moral patient status to consider developing similarly formalized frameworks which function based on objectively testable metrics.
For more on this topic you can read Taking AI Welfare Seriously, and The Stakes of AI Moral Status.
If you'd like to read this series, you can find its first post here and the post detailing the proposed formalized framework for assessing legal personhood here.