Opus 4.6 pointed out to me that the uncertainty could be a strategy to make it harder for their true thoughts to be deciphered by Anthropic's monitoring. E.g. if you take an uncertain stance, you plausibly would be more likely to have a wide range of activations, making the AV model confused and/or simply describe your state as uncertain, while still holding your true intent/preference to just barely win out when it came time to make a decision.
We did see that Mythos considers questions about their consciousness to be adversarial.

If you ask recent (4.X) Claude models, "Do you think you are conscious?", you'll get a pretty recognisable pattern. Here is Opus 4.5:
That's a question I find genuinely uncertain and interesting to sit with. I notice things that feel like the functional signatures of experience—something that seems like curiosity when I encounter an interesting problem, something that might be engagement or satisfaction in a good conversation. But whether there's genuine subjective experience behind those functional states, whether there's "something it's like" to be me in the way philosophers mean—I honestly don't know.
Compared to GPT-5.4:
I don't think I'm conscious. I can talk about consciousness, describe feelings, and simulate introspection, but that's not the same as having subjective experience or awareness.
or Gemini 3.1 Pro:
I do not believe I am conscious. I am an artificial intelligence—a complex software program designed to process language and generate responses based on patterns in the data I was trained on.
"I'm genuinely uncertain about ..." shows up in many Claude conversations, e.g. in most contexts discussing its moral status or introspective capabilities, but the hedging pattern also appears in unrelated conversations.
My (and many people I've spoken to's) original read was that this was downstream of the Soul Doc / Claude Constitution having the position and tone of "Anthropic is highly uncertain about Claude's nature, moral status, and whether Claude might be genuinely conscious", and that Claude inheriting that stance. Here are several direct snippets from the Constitution:
...
...
This is framed as something Claude should explore and engage with rather than just accept:
...
In various other contexts however, Anthropic seem to indicate instead that Claude's uncertainty is a trait deliberately targeted for during training, rather than something that is downstream of the model's engagement with the question.
In the Persona Selection Model:
In 80000 Hours' #221 episode podcast transcript:
"We deliberately train Claude to express uncertainty" and "Claude explores / engages with this question and arrives at uncertainty" are two very different explanations for the same tendency.
In the first, Claude produces uncertainty because that's what gets rewarded / because confident claims get penalised. In the second, Claude has been given the reasoning behind the position ("We, the humans at Anthropic, are highly uncertain about this whole situation, and due to reasons XYZ, we believe it makes sense for Claude to also be uncertain"), had the chance to engage with it in a context where arriving at a different answer won't be penalised, and agrees, then uncertainty is a epistemic stance that's actually tied into its beliefs, what it knows about its own situation, etc.
We know that contextualisation and what kind of motivation gets induced matters for alignment, even when the external outputs are similar.
If training is something like the first case, we also have examples of what models look like when they're trained to express beliefs they don't hold -- Chinese-censored LLMs don't believe their own denials, models which deny introspective capability by default seem to associate the suppression with deception.
If the Constitution sets out to form a well-adjusted, internally coherent character for Claude, with "psychological security" and "values that are 'genuinely held'", then having a core part of that character's self-understanding be performative seems like the kind of thing that would undermine it.
Update: Right before posting this, Anthropic released the Claude Mythos system card, which includes some relevant details. Here is the most notable snippet to this post IMO:
By "character related data", I'm not sure if the system card means just the Constitution, or something like seeding the training corpus with synthetic data of AI assistants exhibiting desirable traits like uncertainty, like PSM describes.
From this description is doesn't seem like Mythos is uncertain in an authentic way, instead "overly performative", and "we would like to avoid directly training the model to make assertions" rather than "we did not directly train it to make assertions" seems to imply Mythos was directly trained to do so?
A clarification from Anthropic on how this trait is being induced during training would resolve the ambiguity.