x

LESSWRONG

LW

Anastasia Ellis — LessWrong

Anastasia Ellis

Anastasia Ellis

Message

1

1

8mo

Anastasia Ellis

8mo

Trust and Context: A Different Approach to AI Safety

Disclaimer: I have never posted on LessWrong before. I guess I'm just gonna start writing this down in my own words and hope that the ideas and messages get across in the way I am hoping they will. So here goes nothing. I think large-scale and widely publicly accessible AI...

Aug 9, 2025•1

Beyond Blanket Refusals: Exploring a Trust-Adaptive Safety Layer for LLMs

A research seed seeking feedback on implementation and risks Author note: This post presents my own original ideas and framing. An AI assistant helped with editing and formatting for clarity, but all concepts, arguments, and structure were developed by me. TL;DR * Current LLM guardrails are binary: answer everything or...

Aug 9, 2025•1