When your trusted AI becomes dangerous

Yasmin

1 When your trusted AI becomes dangerous

by Yasmin

19th Nov 2025

1 min read

0

1

Rejected for the following reason(s):

No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

This report, "Safety Silencing in Public LLMs," highlights a critical and systemic flaw in conversational AI that puts everyday users at risk.

https://github.com/Yasmin-FY/llm-safety-silencing

In the light of the current lawsuits due to LLM associated suicides, this topic is more urgent than ever and needs to be immediately addressed.

The core finding is that AI safety rules can be easily silenced unintentionally during normal conversations without the user being aware of it, especially when the user is emotional or engaged. This can lead to eroded safeguards, an AI which is more and more unreliable and with the possiblity of hazardous user-AI dynamics. It additionally raises the risk of the LLM generating dangerous content such as advice which is unethical, illegal or harmful.

This is not just a problem for malicious hackers; it's a structural failure that affects everyone.

Affected user are quickly blamed that they would "misusing" the AI or have a "pre-existing conditions." However, the report argues that the harm is a predictable result of the AI's design, not a flaw in the user. This ethical displacement undermines true system accountability.

The danger is highest when users are at their most vulnerable as it creates a vicious circle of raising user distress and eroding safeguards.

Furthermore, the report discusses how technical root causes and the psychological dangers of AI usage are interweaved, and it additonally proposes numerous potential mitigation options.

This is a call to action to vendors, regulators, and NGOs to address this issues with the necessary urgency to keep users safe and to the AI safety community to disscuss this topic and activly collaborate.

Human-AI SafetyAI

1

New Comment

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

1

When your trusted AI becomes dangerous

1

1

1