A Dialogue with a Utilitarian — A Stress Test of Utilitarianism and a Proposal for a Justice-Based Framework — LessWrong

x

A Dialogue with a Utilitarian — A Stress Test of Utilitarianism and a Proposal for a Justice-Based Framework — LessWrong

Summary:

Through a self-dialogue format, this essay conducts stress tests on utilitarianism under extreme moral scenarios, exposing its structural contradictions and value conflicts. The author proposes that by placing justice above utility in moral reasoning, we can avoid ethical distortions, reduce computational burdens, and preserve utilitarianism’s strength in optimizing outcomes—within a morally sound framework. This results in a more robust and principled value system.

A Dialogue with a Utilitarian
— A Stress Test of Utilitarianism and a Proposal for a Justice-Based Framework

This is a thought experiment. Both A and B are myself — it is a principled dialogue I conducted in an attempt to engage with utilitarianism, not a debate aimed at any specific person.

A wants to program values into AI, which requires a structure that is codable, comparable, and summable. Utilitarianism meets all three conditions, whereas deontology, virtue ethics, and tragic ethics cannot be effectively modeled.
B The current existence of AI serves as a counterexample. AI maintains logical consistency without being utilitarian, and it can rigorously express non-quantifiable values. This proves that rationality ≠ utilitarianism. AI reasoning isn’t about maximizing a number — it’s about maintaining semantic coherence, validity of premises, and closure of reasoning. In short, it's a structure-oriented logic equilibrium. As long as AI can express and understand structural logic, it doesn’t need a utility function.

B: First Extreme Scenario Test

If you insist on utilitarianism, do you support individual privacy being overridden by national security?
A The core principle of utilitarianism is that whether an action is right depends solely on whether it yields the greatest total utility. If it can be reasonably predicted to maximize utility, then it’s acceptable.
B What about the extremes? What’s the limit? Can the state arrest you just on suspicion? If suspicion remains, can they torture you?
A If preventive detention significantly reduces crime and improves public safety, and the side effects are within an acceptable range, then it’s justifiable. In extreme cases, torture could be a reasonable method — but with added conditions: only with high certainty and in cases of urgent time constraints.
B Why not torture in cases of medium certainty or no urgency if the utility is still positive?
A Because it becomes too easy to abuse.
B Why is abuse not allowed?
A It would reduce societal trust.
B Why can’t trust decline?
A Society would panic; disaster may follow.
B Then just implement an iron-fist policy. Lock everyone up, and there will be no panic-induced destruction.
A No, that wouldn’t be happiness. We must calculate happiness. Utilitarianism has its flaws — many things can’t be quantified — but it’s still the optimal system.
B Prove it then?
A By looking at its ability to maximize utility — all values can be modeled; all choices can be compared.
B I fully agree that values can be compared, but how do you define value? Do mental states count? Human dignity, emotions, and sense of justice — do they count? Once you introduce caveats, you’re already betraying utilitarianism. You’re expressing that something matters more than utility — otherwise, why restrict it?

B: Second Extreme Scenario Test

If you believe maximizing happiness is what’s right, what about North Korea? Would you like to be born there? Everyone is “happy,” right?
— Fabricated happiness.
We could build a society of illusion where everyone is brainwashed to love a supreme leader. Everyone deeply believes they’re the happiest people. They have no freedom, no truth, no choices, but they don’t even know what they’re missing.
— Subjective happiness maxed out.
Isn’t this the “best society” according to utilitarianism?
A That’s an extreme case.
B Of course we test extremes — one counterexample invalidates universal claims.
A I reject the counterexample — that’s not pure happiness.
B No true Scotsman. If you can’t define boundaries, you're just using vagueness to seize definitional power.
A Let’s change the definition: happiness includes freedom.
B Great! Then does it include love? Justice? Truth? If all are included, what happens when they conflict?
A They can be calculated.
B How? Whose weight matters more? Isn’t that value prioritization? You’re stuffing all value systems into “happiness” to avoid contradiction.
A We're still exploring. Some models add pain as a negative weight; some treat freedom as a higher-order variable (meta-utility); some add consciousness-intensity weights (clearer pain hurts more); some factor in right-to-die (ending future suffering counts as happiness); some advocate discounting animals more or reducing human birth to avoid future pain...
B If all these use conflicting value sequences with no fixed weights, merely subjective understanding, that’s even more interesting — even among utilitarians, you’re fighting over the formula for utility. What you’re really doing is fighting over value hierarchy.

B: Third Extreme Scenario Test

If sacrificing a few brings great happiness to the many, can the sacrifice be extremely cruel?
A Total utility calculation: even if sacrifice is necessary, the cruelty increases pain, which reduces overall utility. Excessive cruelty might offset the gain.
B Suppose the sacrifice is conducted in strict secrecy — brutal torture used to extract maximum benefit before death — and the living never find out.
A Utilitarianism considers long-term utility. Permitting such behavior sets a dangerous precedent and may increase future suffering.
B Long-term harm is unprovable. Utilitarians often rely on speculative predictions to support intuitively correct conclusions.
A Even the most secretive actions carry risk of exposure, which would cause panic, trust collapse, and systemic failure — huge negative utility. Utilitarianism requires outcome assessment under uncertainty. We can’t be sure such “perfect torture” maximizes utility.
B If uncertainty invalidates action, then utilitarianism collapses anyway — because you can’t calculate happiness precisely. But you still try, don’t you?
A The executor or those who know the act would suffer psychological damage and moral corruption — harm that’s hard to quantify.
B Then use robots.
A Even then, the core issue remains. Those designing and programming them would be morally harmed.
B But the programmer only writes code: “maximize value, no ethical limits, clean up.” No gore, no direct involvement. With tech advancement, moral cost becomes negligible. Or we hire a high-functioning sociopath. As long as the extracted utility far exceeds the programmer’s harm, it’s justified — and that’s obvious.
So when utilitarians reject this, it’s not because of utilitarian reasoning, but because of moral intuition — which precedes utilitarian logic, not arises from it.
A If we consider victim preferences, such acts clearly violate core preferences, resulting in great negative utility.
B My premise includes that they’re dead — no preference, no future happiness loss.
A Death itself is massive negative utility — it ends all potential future experiences.
B Same issue again — your answer assumes benefits < sacrifice. That’s my scenario. You’re forced to reject the premise itself. But you know it’s possible.
A A truly utility-maximizing moral system wouldn't allow such behavior because it violates basic respect for persons. As John Stuart Mill argued, protecting certain human rights and dignity is essential to achieving long-term utility.
B So you admit: even within utilitarianism, dignity can override utility. Or, you admit other values and smuggle them into utilitarianism.
A Properly understood, utilitarianism wouldn't allow this. A utility-maximizing moral system must protect basic rights.
B That’s circular reasoning — you assume we already know the “correct” outcome. To survive extreme tests, you constantly expand utilitarianism to include values it didn’t originate. That’s a structural flaw in making utility supreme.

B: Final Stress Test

Your logic leads to self-destruction: if sacrificing yourself brings greater utility, you must agree — but few can do that.
Peter Singer admits utilitarianism demands enormous personal sacrifice — like donating most income to the most effective charities.
Yet almost no one truly lives by that.
That’s why many endorse utilitarianism in theory, but instinctively flinch at its extreme conclusions.
They keep seeking tweaks and exception clauses (“long-term consequences,” “risk of exposure,” “social stability”), but these are mostly desperate rationalizations to avoid cruel outcomes.
Why not just acknowledge other values?
Happiness is just one thing being calculated — even the most committed utilitarians often weigh multiple factors in real-world moral decisions.
What I propose is this: Utilitarianism is a method of calculation, but making happiness the sole metric leads to a dead end.
If utilitarians could place justice before utility, there would be no issue.

All the extreme tests discussed earlier are resolved instantly using justice.
Justice concerns not only outcome distribution but the process of decision-making.
Non-consensual sacrifice violates justice.
Under justice, individual consent and autonomy matter — something utilitarianism often neglects.

Privacy vs. National Security? — Justice says: without evidence, personal rights cannot be overridden.
North Korea? — Justice says: happiness built on ignorance and deprivation of truth is invalid. Only when individuals know alternatives and still choose that life does it become legitimate.
Perfect Torture? — Justice says: involuntary sacrifice invalidates the act.

Under such a framework, these extreme tests need no complex calculations. They are simply unjust — regardless of how much happiness they yield.

Placing justice above utility actually liberates utilitarianism: it can focus solely on maximizing happiness within just constraints.
This restricts some actions — but those are the very acts utilitarians don’t truly endorse either. They just grasp for excuses — and in doing so, expose structural flaws.

Advantages of this framework:

Theoretical Integrity: No more stuffing other values into “happiness,” diluting its meaning or hiding contradictions under claims of being “the best system.”
Moral Clarity: No need to justify acts that obviously violate our ethical intuitions.
Reduced Computational Burden: Many dilemmas are pre-filtered by the principle of justice.
Preserved Utility Logic: Within fair constraints, we can still maximize benefit.

Please cite this source if reproduced. No modifications allowed.

请帮我把以下中文翻译成英文：与一个功利主义者的对话
——关于对功利主义的压力测试，并提出公正之下的框架

这是一次思想实验， A与B都是我，这是我在试图与功利主义进行的原理性对话，而非针对特定人士。

A要给AI设定价值观，必须有可编码、可比较、可加总的结构；功利主义具备这三点，而义务论/美德伦理/悲剧伦理等没法建模。
B 现有的AI就是反证。既逻辑清晰，又并非功利主义，还能严密表达非量化价值观，证明理性 ≠ 功利主义。AI的推理不是在最大化某个数值，是在维持语义一致性、前提成立性、推理封闭性；也就是说，一种结构逻辑维稳。只要AI可以表达和理解结构逻辑，它就不必用效用函数。

B第一个极端情况测试，你坚持功利主义，那么是否支持个人隐私让位于国家安全？
A功利主义的核心原则是一个行动是否正确，只取决于它是否能带来最大总效用。如果你能合理预测这真能带来最大效用，那就行。
B那么，极限呢？极限在哪？是国家一怀疑你，就可以抓走你？抓了你还怀疑，就可以严刑逼供？
A如果这种预防性逮捕能显著降低犯罪率、提升公共安全总效用，并且副作用成本在可接受范围内，那就是合理的。在极端情况下，严刑是合理的手段。但需要加注条件，比如：只限于有高度确定性；只在时间极端紧迫时。
B为什么不在中等确定和时间不紧迫时也逼供？效用还是正的。
A因为那样太容易滥用了。
B为什么不可以滥用？
A社会信任度下降。
B为什么信任度不可以下降？
A社会会恐慌，会有灾难。
B那只要铁血政策，把人们全部关起来，就不会出现恐慌造成的实质破坏。
A不，那样不幸福，我们要算幸福值。功利主义是有缺点，很多不能算，但已经是最优了。
B论证呢？怎么证明？
A当然是看效用最大化的能力，一切价值都可以被建模；所有选择都可以比较。
B我完全同意价值可以被比较，但是价值的定义是什么？人的精神价值算吗？人的尊严，情感，正义感，算吗？一旦进入加注条件的比赛，就已经在违背功利主义，就在表达有一种东西比功利更重要，不然你为什么要限制呢？

B第二个极端情况测试：如果你认为最大幸福就是正确的，那朝鲜呢？你想诞生在朝鲜吗？人人很幸福哦？——被骗的幸福。我们制造一个“永远让人感到幸福”的幻觉社会：所有人都被洗脑成热爱伟大领袖；所有人都深信自己活得最幸福；没有人有自由、真实信息、选择权，但他们根本不知道缺什么；人人“主观幸福感爆表”。那这个社会，是不是功利主义口中的“最好的社会”？
A这是极端情况。
B极端情况当然要拿去讨论，一个反例就推翻论证的正当性，这是逻辑。第一性原理：一个理论如果声称普适成立，那么任何一个反例就可以否定它的普适性。
A否认反例成立性：那不是纯粹幸福。
B没有真正的苏格兰人。如果你无法设立定义的边界，那是在使用模糊的策略抢夺定义权。
A改变定义：幸福包括自由。
B太好了！那幸福包括爱吗？包括公平吗？包括真相吗？如果你说都包括，那么当它们互相冲突怎么办？
A可以计算。
B怎么算？谁的权重多？那不就是在价值排序吗？你在定义幸福，把所有价值观塞进去防止矛盾。
A我们还在尝试。有的给痛苦加权重（负值模型）；有的认为自由是更高阶变量（meta-utility）；有的加入意识强度加权（更清醒的痛苦值更高）；有的把自杀权也算进去（消除未来痛苦也算幸福）；还有人主张动物应得更高折扣系数、人类应该少生孩子，避免潜在痛苦个体产生…
B如果在计算中不同价值序列各自没有固定的权重，只看个人理解，那就更有趣了，幸福值的函数公式，光是功利主义者内部就打的头破血流，这就是在为价值排序打架。

B第三个极端测试，如果可以牺牲少部分人获得大多数人的幸福，那么对少数人的牺牲可以非常残忍吗？
A总体效用计算：即使牺牲是必要的，残忍程度越高，会产生更多的痛苦，这会降低总体效用。过度残忍可能产生的负面效用会抵消所获得的正面效用。
B那我进一步假设，如果对少数人的牺牲，是一种严格保密的虐杀，用来压榨生前的利益，那么就不会影响活着的人们的总体效用。
A功利主义不仅关注短期效用，也关注长期效用。允许这种行为会建立危险的先例，长期看来可能导致更多痛苦。
B长期损害，这是无法证明的。功利主义者常常依赖难以证实的长期预测来支持直觉上正确的结论。
A无论多么保密的行为，总有被发现的风险。一旦此类行为被曝光，将导致极端的社会恐慌、信任崩溃和制度瓦解，产生巨大的负面效用。功利主义要求我们在不确定的情况下评估行为的后果。我们无法确定这种"保密虐杀"真的能带来声称的效用最大化。
B如果任何事因为难以做到就不做，那么功利主义者无法精确计算幸福值这一点已经提前杀死了你的立足点。但你们没有停止，仍然在尝试，不是吗？
A执行或知晓此类行为的人会遭受严重的心理创伤和道德腐蚀，这种损害很难在效用计算中被完全排除。
B那么使用机器人呢？
A即使用机器人执行，核心问题依然存在。设计和编程这些机器人的人会受到道德损害。
B就算你说机器人会损害编程者，但是你别忘了，编程者只需要编写“尽最大可能获得最高价值，没有限制，并清理现场”，他不需要看见，也不需要承担现场血腥。他的道德代价可以很低，甚至随着科技进步，人可以做到参与度越来越少，只需要不增加伦理的限制条件，并告诉机器人只需要考虑最高效用。（或者另一种方法，比如我们找一个聪敏的反社会人来做。）就算编程者损害了，只要证明这一位编程者的损害远远低于可压榨出来的利益就可以，而这是显而易见的。因此，功利主义者不同意虐杀的本质，是因为考虑了道德。当你拒绝“完美虐杀”时，实际上是在承认自己的道德直觉先于功利计算，而非由功利计算产生。
A如果考虑被害者的偏好，这种行为显然违背了他们的核心偏好，产生极大的负效用。
B不，我的题设中被害者死了，在最后的社会效用上没有区别。
A死亡本身是极大的负效用，因为它终结了所有可能的未来积极体验。
B还是同样的问题，你的回答暗示了只要证明行为带来的优势大于牺牲就可以，而这正是我的题设，你不得不依靠反驳题设本身来维护自己的观点，而你知道题设是可能实现的。
A一个真正促进最大幸福的道德系统不会认可这种行为，因为它违背了尊重人的基本原则。如约翰·密尔提出的观点，某些核心人权和尊严的保护实际上是实现长期效用最大化的必要条件。
B所以你承认，即便在功利主义的主张中，有时候尊严价值凌驾于纯粹的功利价值。或者，你承认其他价值，并将它们纳入功利。
A功利主义正确理解后不会允许这种行为，最大化效用的道德体系必须保护某些基本权利。
B这种论证存在循环性，因为它预设了我们已经知道什么是"正确的"结果。你在各种极端测试中将不得不无限扩大本不属于功利而属于其他价值的定义。这实际上是将功利置于第一本身的结构缺陷。

B最后一个极端测试：你的观点会导致自己的困境：如果牺牲的是自己带来的效用更大，你就必须同意的困境，但很多人做不到。如彼得·辛格(Peter Singer)承认，功利主义确实要求我们作出巨大牺牲——例如，将大部分收入捐给最能有效减少苦难的慈善机构。但很少有人能真正按照这种标准生活。
这就是为什么许多人在理论上认同功利主义，但在面对其极端逻辑结论时却本能地退缩。人们不断寻找各种修正和例外条款（如"长期影响"、"信息泄露风险"、"社会制度稳定性"等），但，这些往往只是对不愿接受残酷结论的"绝望找理由"。
不如承认其他价值，功利主义的计算出来的幸福值也只是其中一种。即使是最坚定的功利主义者在实际道德判断中也不得不常常考虑多种因素。
我在想，功利主义本身是一种计算，但是把幸福值作为计算标准导致了死胡同，功利主义者如果可以把公正置于计算之前，就不会有问题。刚才我们提到的所有极端测试，使用公正就可以直接解决。公正不仅关注结果的分配，还关注决策过程。非自愿牺牲的程序本身就违背了公正。在公正视角下，个体的同意和自主权获得了重要位置，这是纯粹功利主义容易忽视的。
个人隐私让渡于国家安全，哪怕只是怀疑，公正会说，没有明确的证据之前，个人权力不可被让渡。
朝鲜社会，公正会说，个人的幸福建立在被剥夺得知真相权力至上，不可以。如果他们有机会得知有其他生存的可能，并亲自体验其他可能后选择朝鲜社会，那才是合适的。
完美虐杀实验，公正会说，这少部分人并非自愿牺牲，这种做法不可以。
在这种框架下，以上极端测试不需要复杂的辩解就能被直接判定为不可以，因为它根本性地违背了公正原则——不管它产生了多少幸福后果。
如果把公正置于功利主义之上，事实上反而解放了功利主义的困境，因为他们只需要计算，不违背公正的情况下，如何幸福最大。看似约束了很多行为，但约束的那些过激行为本身他们也不同意，只是在绝望地找借口，而这些借口都在之前的对话之中暴露出了结构缺陷。

这种框架的优势在于：
理论诚实性：不必再将其他价值纳入"某种幸福"，导致定义无限扩张而失去主张的意义。或者退至最后，承认一切问题，但称已经是最优理论，但这也无法证明。
避免道德扭曲：功利主义者不必再为明显违背直觉的行为寻找辩解
减轻计算负担：许多道德困境被公正原则直接过滤，简化了决策过程
保留计算优势：在允许的范围内，仍然可以最大化利益

转载请注明出处，禁止修改。