x

Correct Answers, Corrupt Premises: The AI Morality Paradox — LessWrong

1

Correct Answers, Corrupt Premises: The AI Morality Paradox

23rd Apr 2025

4 min read

1

This post was rejected for the following reason(s):

Clearer Introduction. It was hard for me to assess whether your submission was a good fit for the site due to its length and that the opening didn’t seem to explain the overall goal of your submission. Your first couple paragraphs should make it obvious what the main point of your post is, and ideally gesture at the strongest argument for that point. It's helpful to explain why your post is relevant to the LessWrong audience.
(For new users with complex ideas, we strongly recommend people to state the strongest single argument in the post within the introduction, to make it easier to evaluate at a glance whether it's a good fit for LessWrong. Note, this is different than most academic abstractions, which typically only describe the idea at a high level)
Not obviously not Language Model. Sometimes we get posts or comments that where it's not clearly human generated.
LLM content is generally not good enough for LessWrong, and in particular we don't want it from new users who haven't demonstrated a more general track record of good content. See our current policy on LLM content.
We caution that LLMs tend to agree with you regardless of what you're saying, and don't have good enough judgment to evaluate content. If you're talking extensively with LLMs to develop your ideas (especially if you're talking about philosophical, basic physics or AI topics) and you've been rejected here, you most likely not going to get approved on LessWrong on those topics. You could read the Sequences Highlights to catch up the site basics, and if you try submitting again, focus on much narrower topics.
If your post/comment was not generated by an LLM and you think the rejection was a mistake, message us on intercom to convince us you're a real person. We may or may not allow the particular content you were trying to post, depending on circumstances.

1

New Comment

More from ethoshift

Curated and popular this week

Suppose your company deploys an AI-driven hiring system. It swiftly and decisively rejects an applicant, flagging them as a "poor cultural fit". The trigger? A single sentence from their cover letter, filtered through sentiment analysis and matched against legacy hiring patterns.

The crucial point is that the applicant who was rejected was fleeing an oppressive regime, and their flagged statement had nothing to do with cultural fit. It was about survival. The AI delivered a technically "correct" output (correct according to its parameters), but based entirely on a misinterpretation of its input.

This scenario points us to a deeper and rather uncomfortable reality: AI systems almost never "see" raw events; they perceive heavily mediated abstractions of reality. Each layer of abstraction can distort context, meaning, and ultimately moral significance. This effect reflects a known limitation in AI interpretability and abstraction, as discussed in Grunewald’s Against LLM Reductionism (2024), where reductionist framings obscure deeper cognitive patterns and real-world grounding.

The Problem of Pre-Interpreted Signals

In ethical decision-making, especially in high-stakes scenarios, the provenance and purity of data aren't mere technical niceties; they're fundamental. The ethical legitimacy of any action depends significantly on understanding precisely what happened and why.

Yet AI systems operate at a remove from raw experience, interpreting data that's already been structured, labelled, and condensed. These layers, intended to simplify and accelerate decision-making, inherently introduce potential misinterpretation. Small misunderstandings at one stage can cascade into grave ethical errors downstream. This scenario echoes concerns raised by Paul Bloom in Against Empathy (2016), where immediate affective signals can mislead moral reasoning.

Recent research from MIT has emphasised that the tools we currently use to trace the origins of data and verify consent are deeply inadequate, making ethical governance of AI harder than many acknowledge. They call for universal standards in data provenance to support truly ethical AI systems, an approach we should heed with urgency.

Similarly, the opacity of AI reasoning, sometimes euphemistically dubbed the "black box" problem, remains a central concern. A doctoral study from the University of Idaho underlines that enhancing explainability is not merely an academic preference but a practical necessity for accountability. Without it, we cannot properly interrogate the causal links between input and output in algorithmic systems, especially when those outputs carry ethical or legal weight.

The first and perhaps most critical question is whether any ethical evaluation can be trusted if it doesn't engage with the full, unfiltered signal. Without access to raw input, systems rely on abstractions – sanitised narratives of reality stripped of nuance. Such an approach is the moral equivalent of trying to judge a painting from a pixelated scan. It raises the unsettling possibility that even the most sophisticated ethical systems might be issuing verdicts based on illusions. One possible mitigation would be to incorporate parallel raw-data review channels. These could allow a system to preserve traceability and auditability even while making rapid decisions.

The second issue arises when different components of an AI system interpret the same event in contradictory ways. For example, one module might detect threat, while another interprets the same behaviour as emotionally vulnerable. This situation introduces the problem of epistemic authority: whose version of the event do we trust? In human systems, disagreement often triggers deliberation. So too, ethical AI may need meta-decision frameworks, where conflicting interpretations must offer justifications, subject to arbitration by a higher-order oversight process.

Next, there's the question of how to retain fidelity to raw data without compromising speed. Applications in real-time, such as autonomous vehicles and fraud detection, necessitate quick decisions. Introducing ethical oversight at this stage risks delay. Yet to bypass it entirely invites catastrophe. Event-driven systems from control theory may offer a clue. We might design ethics modules that stay dormant until morally ambiguous contexts are detected, essentially creating an "ethical black box" that records continuously but surfaces selectively. Marina Jirotka and others have proposed just such a mechanism to enhance AI transparency in real-world deployments.

Finally, we must ask whether it's possible to derive ethical meaning without altering the original signal. Inference often involves transformation: converting the messy, qualitative nature of lived experience into quantifiable metrics. But each such transformation risks diluting or misrepresenting the truth. An alternative approach would be dual-mode ethics engines that run both the raw and processed data streams in tandem, actively comparing the two and flagging significant divergences.

Some suggest that the path forward lies not in technical fixes alone but in governance structures: continuous ethical auditing, human-in-the-loop frameworks, and international standards for AI safety and alignment. The European Commission, for instance, has outlined guidelines for trustworthy AI that emphasise monitoring and oversight as non-negotiable components of ethical practice.

In this piece, I argue that to build genuinely ethical AI, we must fundamentally reconsider how these systems interpret and act upon data. It's not enough to refine algorithms or impose top-level ethical guardrails; we must address the original interpretative act the moment raw reality is translated into structured input. Only then can AI ethics progress beyond performative checklists into something genuinely worthy of trust.

References: This post builds on LessWrong discussions around AI interpretability, epistemic reliability, and map-territory fidelity. While it doesn’t rehash canonical material, it shares lineage with concerns from The Sequences, particularly around reasoning from flawed premises, opacity in belief formation, and the risks of acting on misunderstood inputs. The goal is to continue that trajectory by probing how ethical AI systems might maintain fidelity to reality amid abstraction and speed.

Grunewald, E. (2024). Against LLM Reductionism. LessWrong.
Bloom, P. (2016). Against Empathy: The Case for Rational Compassion. Ecco Press.
Seth Herd (2025). LLM AGI will have memory, and memory changes alignment. LessWrong.
MIT Generative AI PubPub (2024). On Data Provenance and Ethical AI. mit-genai.pubpub.org
Kale, A. (2023). Explainable AI and Transparency in Algorithms. University of Idaho Dissertation. objects.lib.uidaho.edu
Jirotka, M. (2021). Ethical Black Boxes and Transparent AI. Wikipedia Summary. en.wikipedia.org
European Commission (2019). Ethics Guidelines for Trustworthy AI. turing.ac.uk

Note: This post was developed in collaboration with GPT-4, with extensive human authorship, direction, and editorial refinement throughout.