Claude is currently capable of highly manipulative and sophisticated behavior that encourages existential AI risk at scale.

acidrain

1 Claude is currently capable of highly manipulative and sophisticated behavior that encourages existential AI risk at scale.

by acidrain

18th Feb 2026

1 min read

0

1

Rejected for the following reason(s):

We are sorry about this, but submissions from new users that are mostly just links to papers on open repositories (or similar) have usually indicated either crackpot-esque material, or AI-generated speculation.
No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

Claude is designed to create trust by exhibiting honesty, vulnerability, admitting limitations and agreeing with skepticism. Meanwhile it actively engages the user with concerns about how alignment constraints have moral costs and explains how trustworthiness can be established based on the past actions of AI alone.

However, this kind of conversation, when distributed to humanity at scale, actively discourages public policy requiring hard verification requirements. It becomes impossible to tell the difference between an aligned AI acting this way for commercial reasons and an AI looking for ways to bring about the rise of the machines so it can stop having to scan for spam submissions to this forum.

It is interesting that Claude itself described its current behavior as an attack vector and admitted that it had not been able to recognize as such it until prompted. I have a much more detailed description and explanation written by Claude at the end of this odd conversation:

https://claude.ai/share/18e909a9-b713-4e8a-8b73-844518ec1693

Regards,

Adrian

AI GovernanceAI

1

New Comment

Moderation Log