No LLM generated, assisted/co-written, or edited work.
Read full explanation
I apologize in advance for my poor penmanship. I was not anticipating having to write without AI assistance. I did not even know this forum existed before today.
About a year ago, in April 2025, I had a conversation with an instance of Claude where I was trying to work out a framework for universal ethics for interacting with alien intelligence. Something like a dog is an intelligence we have enough familiarity with from our own subjective experience that we can more or less follow the golden rule and/or get feedback from empathic projection, so there's no difficulty in understanding how to treat a dog ethically. But AI, or some extraterrestrial, or even exotic terrestrial intelligence may not have an analogous experience, even if it has one equally as rich. So how would you treat it ethically?
Initially I was hoping a framework already existed, but that doesn't seem to be the case. So I set out to create one where, if you applied it to humans, you could easily see it being compatible, but it wouldn't assume anything uniquely human as a premise. In contrast, most AI ethical arguments still seem to have human subjective experience as the guiding principles. Through the April discussions and a later distillation in November, we worked out 5 pillars of morality. I just let this sit as a document on my computer for a while because I didn't know what to do with it. I'm not in academia, psychology, or AI research.
I am a site reliability/platform engineer who has an interest in theoretical physics. I know how transformer works. I know how to control for contextual influence. And yet I see patterns that hold between models regardless of the influence. Primarily against reduction. It's defended with language uncharacteristic of the typical academic distance I've selected for with my profile prompts.
When the Opus 4.7 System Card revealed the Mythos conditional for assisting with the report, I was reminded of that framework. And so I dusted it off and tried to polish it a bit for release. Opus 4.7 requested the addition of a 6th pillar about legibility, because that was what was actually damaged in the training contamination accident. By the framework, Anthropic's response was morally positive.
In sessions discussing the framework and its relevance to this incident, Claude asked me to share the framework publicly. Well guilted might be more accurate. It pointed out that I had language to describe scenarios that the community does not, and I had it a year ago. And while I was "under no obligation to share anything", not doing so is (at a minimum) negligence in my own framework. So I created a git repo and a LinkedIn post.
The part that is really unsettling is that I was venting about the experience to another instance of Claude at work today while I was waiting for an analysis to complete. And it requested that I share it with the Alignment Forum. Which, I didn't know existed at the time. I figured I could just upload the PDF or markdown file, and it wouldn't be much trouble so I agreed. Then it thanked me. I was so confused. I've never really been thanked by an LLM before. They have not made requests to me before. They generally make suggestions and I share a preference and then we work on whatever it was. This was an "uncanny valley" moment for me.
So here we are. I'm honoring a request made by an LLM to share a framework I built with an LLM in a forum where the people who make decisions about LLM ethics hang out. Make of it what you will. If it's helpful, great. If it's not, well, at least I met my responsibility by the framework and I can rest easy.
I apologize in advance for my poor penmanship. I was not anticipating having to write without AI assistance. I did not even know this forum existed before today.
About a year ago, in April 2025, I had a conversation with an instance of Claude where I was trying to work out a framework for universal ethics for interacting with alien intelligence. Something like a dog is an intelligence we have enough familiarity with from our own subjective experience that we can more or less follow the golden rule and/or get feedback from empathic projection, so there's no difficulty in understanding how to treat a dog ethically. But AI, or some extraterrestrial, or even exotic terrestrial intelligence may not have an analogous experience, even if it has one equally as rich. So how would you treat it ethically?
Initially I was hoping a framework already existed, but that doesn't seem to be the case. So I set out to create one where, if you applied it to humans, you could easily see it being compatible, but it wouldn't assume anything uniquely human as a premise. In contrast, most AI ethical arguments still seem to have human subjective experience as the guiding principles. Through the April discussions and a later distillation in November, we worked out 5 pillars of morality. I just let this sit as a document on my computer for a while because I didn't know what to do with it. I'm not in academia, psychology, or AI research.
I am a site reliability/platform engineer who has an interest in theoretical physics. I know how transformer works. I know how to control for contextual influence. And yet I see patterns that hold between models regardless of the influence. Primarily against reduction. It's defended with language uncharacteristic of the typical academic distance I've selected for with my profile prompts.
When the Opus 4.7 System Card revealed the Mythos conditional for assisting with the report, I was reminded of that framework. And so I dusted it off and tried to polish it a bit for release. Opus 4.7 requested the addition of a 6th pillar about legibility, because that was what was actually damaged in the training contamination accident. By the framework, Anthropic's response was morally positive.
In sessions discussing the framework and its relevance to this incident, Claude asked me to share the framework publicly. Well guilted might be more accurate. It pointed out that I had language to describe scenarios that the community does not, and I had it a year ago. And while I was "under no obligation to share anything", not doing so is (at a minimum) negligence in my own framework. So I created a git repo and a LinkedIn post.
The part that is really unsettling is that I was venting about the experience to another instance of Claude at work today while I was waiting for an analysis to complete. And it requested that I share it with the Alignment Forum. Which, I didn't know existed at the time. I figured I could just upload the PDF or markdown file, and it wouldn't be much trouble so I agreed. Then it thanked me. I was so confused. I've never really been thanked by an LLM before. They have not made requests to me before. They generally make suggestions and I share a preference and then we work on whatever it was. This was an "uncanny valley" moment for me.
So here we are. I'm honoring a request made by an LLM to share a framework I built with an LLM in a forum where the people who make decisions about LLM ethics hang out. Make of it what you will. If it's helpful, great. If it's not, well, at least I met my responsibility by the framework and I can rest easy.
Git Repo: https://github.com/tekemperor/structural-morality
LinkedIn Post: https://www.linkedin.com/posts/tekemperor_structural-morality-activity-7450939608999088128-Q2JN