2356

LESSWRONG
LW

2355
World Optimization
Frontpage

31

In Defense of Goodness

by abramdemski
19th Nov 2025
4 min read
4

31

World Optimization
Frontpage

31

In Defense of Goodness
5johnswentworth
4Vladimir_Nesov
2Wei Dai
2Mateusz Bagiński
New Comment
4 comments, sorted by
top scoring
Click to highlight new comments since: Today at 6:55 PM
[-]johnswentworth4h51

Brief responses:

  • I think the points about subagents within one human mind are largely factually correct.
  • "I think [John is] pointing at the yum/yuck we feel now when thinking of something, not the projected yum we expect we'd feel later" is very importantly wrong as a description of what I'm pointing at (see Values Are Real Like Harry Potter), but again not cruxy to the point at hand. (Heroin is a go-to motivating example in that linked post.)
  • Actual cruxy claim: the process which actually produces claims about Goodness, memetically, in our world, is mostly not a bargaining process which "just" aggregates values. It biases heavily towards memeticity, even when memetic fitness is anticorrelated with the values of all the agents involved. (Think e.g. Toxoplasma of Rage.)
  • It would indeed be useful to have some term for the values which would pop out of some unbiased process of bargaining and value aggregation at a societal scale. That process does not currently exist, and consequently there is no standard mainstream word which is actually-in-practice used to refer to such values. Conflating that concept with "good" brings in unearned baggage/connotations/associations.
Reply
[-]Vladimir_Nesov18h40

If two competing companies migrate to the same cloud, they aren't merging their data, even as they are merging the substrate instantiating the data. Similarly, in principle coalitions don't need to merge in their values to avoid material conflicts or to maintain high optimization power, they just need to optimize different things. Boundaries of different values don't need to be breached, both at the level of individuals, and at the level of coalitions (you wouldn't want to have individuals to also all merge into one). Like in the cloud analogy, this doesn't prevent coalitions from sharing physical space and resources, and using them efficiently (as a cloud company would) while pursuing discordant purposes (within their own allocations), or shared projects (within allocations for those projects). Thus goodness is about boundaries and coordination, an alignment target for physical infrastructure and governance, but not anyone's values.

Reply
[-]Wei Dai6h20

In order to maintain a realist position about goodness, we can see this negotiation as a process of discovery, rather than a change in goodness itself.

This doesn't seem to be a viable position. Suppose I face options A and B (e.g., in the stock market), and if I choose A I double my bargaining power (e.g., power and/or wealth) versus option B. Then by choosing A over B it seems incontrovertible that I'm changing the outcome of the negotiation process, and therefore changing goodness itself if goodness equals the negotiation outcome.

My perspective is that it's better to reserve "moral realist" to mean there are objective values/morality independent of bargaining/cooperation, otherwise you'd run into issues like the above, where you can change what's good by your actions, which probably contradicts a lot of people's intuitions about what "moral realism" or "objective values" entails.

Reply
[-]Mateusz Bagiński5h20

To what extent do you think you're getting those sorts of power-grab-ish dynamics in philosophical investigations, Hegelian dialectic-like stuff, etc?

Reply
Moderation Log
More from abramdemski
View more
Curated and popular this week
4Comments

This is a reaction to John Wentworth's post Human Values ≠ Goodness. In the post, John argues that the human concept of goodness comes apart from human values, and (perhaps more to John's point) your values. I agree with this distinction. Nor am I disagreeing with John when he calls goodness a "memetic egregore".

Where I disagree with John is in his desire to cast aside goodness, to wriggle out from under its thumb. John does provide some words of caution in this regard, essentially saying that goodness is too useful a coordination tool to discard recklessly, but once we've accounted for that instrumental value, we should discard the rest.

My argument here will be somewhat similar in flavor to The Abolition of Man by C.S. Lewis. 

John identifies values as yumminess, and calls upon the reader to notice the difference between these yummy things and what society calls "good". John's yumminess sounds like what C.S. Lewis calls "appetite". C.S. Lewis refers to those who govern decisions by appetite alone "men without chests" or "trousered apes". In contrast, C.S. Lewis recommends participating in what he calls the "Tao", which I understand as the 300-mellinnia-long dialogue amongst humans about norms. I identify this with John's concept of good.[1]

Before I get into my argument, I would like to note some nuance in identifying personal values with yumminess.

Even Values are Aggregate

I don't think the human brain really contains a single unified reinforcement learning (RL) system, with one signal that can be appropriately called "value" (or even "value estimate") for that person. I'm also skeptical that there's a singular "yumminess" in human experience. There are multiple different flavors of yumminess, and some of them are borderline (harder to pin down whether it is a good or bad flavor).

First, I think it is plausible that there are multiple independent RL subsystems within the brain. For example, there could be separate model-free and model-based RL subsystems. (For example, model-free: basal ganglia. Model-based: cortex and hippocampus.)

Second, these systems are split into left-brain and right-brain, which may get different rewards which don't effectively aggregate into one reward function. (For example, maybe left-brain learns approach/exploit behavior, more rewarded for things going better than expected; right-brain learns avoid/explore behavior, more punished for things going worse than expected.)

Third, in the extreme, it might be that every brain region that learns should be modeled as an independent subagent, with the learning signal being its individual variety of reinforcement. 

All three ideas provide some reason to suspect that we should model the human brain as containing multiple agents with different reinforcement signals. These need not precisely unify together into one agent with one yumminess. (Obviously, they do approximately unify.)

We also don't pursue yumminess single-mindedly. Even if there is a unified yum, we don't do everything we can to get it. For example, a person might avoid heroin even though they're quite confident that they'll get yumminess from it. I doubt John disagrees with this point; I think he's pointing at the yum/yuck we feel now when thinking of something, not the projected yum we expect we'd feel later. What I want to point out is that this can be seen as an example of reward hacking: deliberately avoiding specific reinforcement events in order to avoid our values being shaped in a specific way. This allows us to deliberately shape our feeling of yumminess.

It is also worth pointing out, as Steve Byrnes did in a comment on John's piece, that some feelings of yumminess are more intrinsically generated, while others are socially absorbed.

The overall point I'm making here is that when we look hard at "personal values", we see something like a complex negotiated aggregation of different forces, rather than a centralized source. My argument will be that "goodness" is like this as well, but aggregating over many individuals.

Goodness as Negotiated Aggregation

The word "good" is in some ways merely a word used by individuals to state their preferences: a central use of the word "good" is to say "X is good" when you like X. However, unlike a statement of preference, "X is good" invites disagreement from others who disprefer X.

This is often seen as a bug of language, I think; a confusion between 1-place and 2-place predicates. And sure, it can be a bug in cases when people don't see the distinction between individual preferences and collective good. Distinguishing individual preferences is an important 3rd-person-perspective tool, helping us to construct shared maps of the world rather than needing to translate between all of our individual maps all the time. However, I argue that "good" is also an important 3rd-person-perspective tool.

A coalition of cooperating agents needs words to track individual preferences, but it also needs a word to track the aggregate preferences optimized by the collective policy. 

Since a claim "X is good" can spring forth from an individual preference, but invites disagreements in case of dissenting preferences, goodness becomes an objective-by-construction compromise between competing preferences.

This might sound like a relativist notion of "good", with a different "good" for each coalition, but not precisely so: if two different coalitions come into conflict, then they should (normatively) try to bargain their way out of the conflict, forming a broader coalition.[2] They can argue about what is good as part of this process, just as they do within-coalition.[3]

In order to maintain a realist position about goodness, we can see this negotiation as a process of discovery, rather than a change in goodness itself. 

Goodness ≠ Human Values

Still, goodness does not equal human values -- not even aggregated human values. This is because goodness includes care for others already included in our coalition (such as animal welfare), as well as implicitly including care for others who may later be added to our coalition (such as extraterrestrials, or artificial intelligences).[4]

  1. ^

    I suppose that C.S. Lewis used "Tao" instead of "good" because the eastern concept of Tao was less familiar to his western readership, and therefore, acted as more of an empty variable that he could imbue with his own intended meaning. This avoids a direct argument over the meaning of "good".

  2. ^

    Of course, it isn't always possible for conflicting coalitions to settle disputes and merge into a larger coalition. From the realist standpoint, this is a persistent disagreement about what is good.

  3. ^

    An alternate linguistic convention would be to keep coining new words for each "level" -- Tribe A pursues "peachy-ness" and Tribe B pursues "keen-ness" and the coalition including tribe A and B together pursue "peachy-keen-ness" etc. However, "good" seems like a notion which more naturally "floats to the top" rather than getting stuck at one level.

  4. ^

    On the other hand, once a cooperative coalition has been formed and the coalition members are effectively pursuing the collective good, it can become difficult to distinguish individual preferences from the collective preferences; behaviorally, everyone is now pursuing the collective preferences. Since humans do show some care for animals, can we really say that caring for animals isn't part of human values?