I agree that the distinction is important. However, my view is that a lot of what you call "goodness" is part of society's mechanism to ensure cooperate/cooperate. It helps other people get yummy stuff, not just you.
You can of course free yourself from that mechanism, and explicitly strategize how to get the most "yumminess" for yourself without ending up broke/addicted/imprisoned/etc. If the rest of society still follows "goodness", that leads to defect/cooperate, and indeed you end up better off. But there's a flaw in this plan.
Part of the point I intended to convey with the post is that society pushing for cooperate/cooperate is one way that Goodness-claims can go memetic, but there are multiple others ways memeticity can be achieved which are not so well aligned with the Values of Humans (either one's own values or others'). Thus this part:
Albert has relatively low innate empathy, and throws out all the Goodness stuff about following the rules and spirit of high-trust communities. Albert just generally hits the “defect” button whenever it’s convenient. Then Albert goes all pikachu surprise face when he’s excluded from high trust communities.
The message is definitely not to go hammering the defect button all the time, that's stupid. Yet somehow every time someone suggests that Goodness is maybe not all it's cracked up to be, lots of onlookers immediately round this to "you should go around hammering the defect button all the time!" (some with positive affect, some with negative) and man I really wish people could stop rounding that off and absorb the actual point.
Hmm. In all your examples, Albert goes against "goodness" and ends up with less "yumminess" as a result. But my point was about a different kind of situation: some hypothetical Albert goes against "goodness" and actually ends up with more "yumminess", but someone else ends up with less. What do you think about such situations?
I would ask Albert: do you generally find it yummy when other people get more yumminess? Do you usually feel like shit when you screw over someone else? For most people, the answers to these are "yes". Most people do not actually like screwing over other people, most of the time (though there are of course exceptions).
Insofar Albert is a sociopath, or is in one of those moods where he really does want to screw over someone else... I would usually say "Look man, I want you to pursue your best life and fulfill your values, so I wish you luck. But also I'm going to try to stop you, because I want the same for other people too, and I want higher-order nice things like high trust communities.". One does not argue against the utility function, as the saying goes.
Indeed, you could make a very reasonable argument that the entire reason AI might be dangerous is because once it's able to automate away the entire economy, as an example, defection no longer has any cost and has massive benefits (at least conditional on no alignment in values).
The basic reason why you can't defect easily and gain massive amounts of utility from social systems is a combo of humans not being able to evade enforcement reliably, due to logistics issues, combined with people being able to reliably detect defection in small groups due to reputation/honor systems, and combined with the fact that humans as individuals are far, far less powerful even selfishly as individuals than as cooperators.
This of course breaks once AGI/ASI is invented, but John Wentworth's post doesn't need to apply to post-AGI/ASI worlds.
Main answer: this post is aimed at a lower level than you are at, and I intentionally did not unpack some of the more advanced questions, because that would have involved long sections which lower-level readers would find either hard to follow or unmotivated.
That said, the way I'd think about your points is in Values Are Real Like Harry Potter and We Don't Know Our Own Values.
I think the confusion here is that "Goodness" means different things depending on whether you're a moral realist or anti-realist.
If you're a moral realist, Goodness is an objective quality that doesn't depend on your feelings/mental state. What is Good may or may not overlap with what you like/prefer/find yummy, but it doesn't have to.
If you're a moral anti-realist, either:
I think "Human Values" is a very poor phrase because:
Instead, people referring to "Human Values" obscure whether they are moral realists or anti-realists, which causes a lot of confusion when determining the implications and logical consistency of their views.
This post doesn't seem to provide reasons to have one's actions be determined by one's feelings of yumminess/yearning, or reasons to think that what one should do is in some sense ultimately specified/defined by one's feelings of yumminess/yearning, over e.g. what you call "Goodness"? I want to state an opposing position, admittedly also basically without argument: that it is right to have one's actions be determined by a whole mess of things together importantly including e.g. linguistic goodness-reasoning, object-level ethical principles stated in language or not really stated in language, meta-principles stated in language or not really stated in language, various feelings, laws, commitments to various (grand and small, shared and individual) projects, assigned duties, debate, democracy, moral advice, various other processes involving (and in particular "running on") other people, etc.. These things in their present state are of course quite poor determiners of action compared to what is possible, and they will need to be critiqued and improved — but I think it is right to improve them from basically "the standpoint they themselves create".[1]
The distinction you're trying to make also strikes me as bizarre given that in almost all people, feelings of yumminess/yearning are determined largely by all these other (at least naively, but imo genuinely and duly) value-carrying things anyway. Are you advocating for a return to following some more primitively determined yumminess/yearning? (If I imagine doing this myself, I imagine ending up with some completely primitively retarded thing as "My Values", and then I feel like saying "no I'm not doing that lmao, fuck these "My Values"".) Are you saying one should not try to revert the yumminess/yearning-shaping done by all this other stuff in the past, but still advising one to avoid any shaping in the future? It'd surprise me if any philosophically serious person would really agree to abstain from e.g. using linguistic goodness-talk in this role going forward.
The distinction also strikes me as bizarre given that in ordinary action-determination, feelings of yumminess/yearning are often not directly applied to some low-level givens, but e.g. to principles stated in language, and so only becoming fully operational in conjunction with eg minimally something like internal partly-linguistic debate. So if one were to get rid of the role of goodness-talk in one's action-determination, even one's existing feelings of yumminess/yearning could no longer remotely be "fully themselves".
If you ask me "but how does the meaning of "I should X" ultimately get specified/defined", then: I don't particularly feel a need to ultimately reduce shoulds to some other thing at all, kinda along the lines of https://en.wikipedia.org/wiki/Tarski's_undefinability_theorem and https://en.wikipedia.org/wiki/G._E._Moore#Open-question_argument . ↩︎
I update my moral values based on my ontology. I try to factor in epistmic uncertainty. I do not attribute goodness to human values, because I do not center my world view around humans only. What an odd thing to do.
Ethics to me is an epistemic project. I read literature, poetry, the Upanishads, the Gita, the Gospels, Meditations, the sequences... More obscure things. I think and I update.
There is a temptation to simply define Goodness as Human Values, or vice versa.
Alas, we do not get to choose the definitions of commonly used words; our attempted definitions will simply be wrong. Unless we stick to mathematics, we will end up sneaking in intuitions which do not follow from our so-called definitions, and thereby mislead ourselves. People who claim that they use some standard word or phrase according to their own definition are, in nearly all cases outside of mathematics, wrong about their own usage patterns.[1]
If we want to know what words mean, we need to look at e.g. how they’re used and where the concepts come from and what mental pictures they summon. And when we look at those things for Goodness and Human Values… they don’t match. And I don’t mean that we shouldn’t pursue Human Values; I mean that the stuff people usually refer to as Goodness is a coherent thing which does not match the actual values of actual humans all that well.
There’s this mental picture where a mind has some sort of goals inside it, stuff it wants, stuff it values, stuff which from-the-inside feels worth doing things for. In old-school AI we’d usually represent that stuff as a utility function, but we wanted some terminology for a more general kind of “values” which doesn’t commit so hard to the mathematical framework (and often-confused conceptual baggage outside the math) of utility functions. The phrase “human values” caught on.
We don’t really know what human values are, or what shape they are, or even whether they’re A Thing at all. We don’t have trivial introspective access to our own values; sometimes we think we value a thing a lot, but realize in hindsight that we value it only a little. But insofar as the mental picture is pointing to a real thing at all, it does tell us how to go look for our values within our own minds.
How do we go look for our own values?
Well, we’re looking for some sort of goals, stuff which our minds want or value, stuff which drives us, etc. What does that feel like from the inside? Think of the stuff that, when you imagine it, feels really yummy. It induces yearning and longing. It feels like you’d be more complete with it. That’s the feeling of stuff that you value a lot. Lesser versions of the same feeling come when imagining things you value less (but still positively).
Personally… I get that feeling of yumminess and yearning when I imagine having a principled mathematical framework for understanding the internal structures of minds, which actually works on e.g. image generators.[2] I also get that feeling of yumminess and yearning when I imagine a really great night of dancing, or particularly great sex, or physically fighting with friends, or my favorite immersive theater shows, or some of my favorite foods at specific restaurants. Sometimes I get a weaker version of the yumminess and yearning feeling when I imagine hanging out around a fire with friends, or just sitting out on my balcony alone at night and watching the city, or dealing with the sort of emergency which is important enough that I drop everything else from my mind and just focus
Those are my values. That’s what human values look like, and how to probe for yours.
I did not first learn about goodness by imagining things and checking how yummy they felt. I first learned about Goodness by my parents and teachers and religious figures and books and movies and so forth telling me that it’s Good to not steal things, Good to do unto others what I’d have them do unto me, Good to follow rules and authority figures, Good to clean up after myself, Good to share things with other kids, Good to not pick my nose, etc, etc.
In other words, I learned about Goodness mostly memetically, absorbing messages from others about what’s Good.
Some of those messages systematically follow from some general principles. Things like “don’t steal” are social rules which help build a high-trust society, making it easier for everyone to get what they want insofar as everyone else follows the rules. We want other people to follow those rules, so we teach other people the rules. Other aspects of Goodness, especially about cleanliness, seem to mostly follow humans’ purity instincts, and are memetically spread mainly by people with relatively-strong purity instincts in an attempt to get people with relatively-weaker purity instincts to be less gross (think nose picking). Still other aspects of Goodness seem rather suspiciously optimized for getting kids to be easier for their parents and teachers to manage - think following rules or respecting one’s elders. Then there are aspects of Goodness which seem to be largely political, driven by the usual political memetic forces.
The main unifying theme here is that Goodness is a memetic egregore; in practice, our shared concept of Goodness is comprised of whatever messages people spread about what other people should value.
… which sure is a different thing from what people do value, when they introspect on what feels yummy.
One thing to flag at this point: you know the feeling of deep loving connection, like a parent-child bond or spousal bond or the feeling you get (to some degree) when deeply empathizing with someone or the feeling of loving connection to God or the universe which people sometimes get from religious experiences? I.e. oxytocin?
For many (most?) people, that feeling is a REALLY big chunk of their Values. It is the thing which feels yummiest, often by such a large margin that it overwhelms everything else. If that’s you, then it’s probably worth stopping to notice that there are other things you value. It is quite possible to hyperoptimize for that one particular yumminess, then burn out and later realize that one values other things too - as many a parent learns when the midlife crisis hits.
That feeling of deep loving connection is also a major component of the memetic egregore Goodness, to such an extent that people often say that Goodness just is that kind of love. Think of the songs or hippies or whoever saying that all the world’s problems would be solved if only we had more love. As with values, it is worth stopping to notice that loving connection is not the entirety of Goodness, as the term is typically used. The people saying that Goodness just is loving connection (or something along those lines) are making the same move as someone trying to define a word; in most cases their usage probably doesn’t even match their own definition on closer inspection.
It is true that deep loving connection is both an especially large chunk of Human Values and an especially large chunk of Goodness, and within that overlap Human Values and Goodness do match. But that’s not the entirety of either Human Values or Goodness, and losing track of the rest is a good way to shoot oneself in the foot eventually.
To summarize so far:
Looking at that first one, the second might seem kind of silly. After all, we mostly don’t get to choose what triggers yumminess or yearning. There are some loopholes - e.g. sometimes we can learn to like things, or intentionally build new associations - but mostly the yumminess is not within conscious control. So it’s kind of silly for the memetic egregore to tell us what we should find yummy.
A central example: gay men mostly don’t seem to have much control over their attraction to men; that yumminess is not under their control. In many times and places the memetic egregore Goodness said that men shouldn’t be sexually attracted to men (those darn purity instincts!), which… usually isn’t all that effective at changing the underlying yumminess or yearning.
What does often happen, when the memetic egregore Goodness dictates something in conflict with actual Humans’ actual Values, is that the humans “tie themselves in knots” internally. The gay man’s attraction to men is still there, but maybe that attraction also triggers a feeling of shame or social anxiety or something. Or maybe the guy just hides his feelings, and then feels alone and stressed because he doesn’t feel safe being open with other people.
Sex and especially BDSM is a ripe area for this sort of thing. An awful lot of people, probably a majority of the population, sure do feel deep yearning to either inflict or receive pain, to take total control over another or give total control to another, to take or be taken by force, to abandon propriety and just be a total slut, to give or receive humiliation, etc. And man, the memetic egregore Goodness sure does not generally approve of those things. And then people tie themselves in knots, with the things that turn them on most also triggering anxiety or insecurity.
I’d like to say here “screw memetic egregores, follow the actual values of actual humans”, but then many people will be complete fucking idiots about it. So first let’s go over what not to do.
There’s a certain type of person… let’s call him Albert. Albert realizes that Goodness is a memetic egregore, and that the memetic egregore is not particularly well aligned with Albert’s own values. And so Albert throws out all that Goodness crap, and just queries his own feelings of yumminess in-the-moment when making decisions.
This goes badly in a few different ways
Sometimes Albert has relatively low innate empathy, and throws out all the Goodness stuff about following the rules and spirit of high-trust communities. Albert just generally hits the “defect” button whenever it’s convenient. Then Albert goes all pikachu surprise face when he’s excluded from high trust communities.
Other times Albert is just bad at thinking far into the future, and jumps on whatever feels yummy in-the-moment without really thinking ahead. A few years down the line Albert is broke.
Or maybe Albert rejects memetic Goodness, ignores authority a little too much, and winds up unemployed or in prison. Or ignores purity instincts a little too much and winds up very sick.
Point is: there’s a Chesterton’s fence here. Don’t be an idiot. Goodness is not very well aligned with actual Humans’ actual Values, but it has been memetically selected for a long time and you probably shouldn’t just jettison the whole thing without checking the pieces for usefulness. In particular, a nontrivial chunk of the memetic egregore Goodness needs to be complied with in order to satisfy your actual Values long term (which usually involves other people), even when it conflicts with your Values short term. Think about the consequences, what will actually happen down the line and how well your Values will actually be satisfied long-term, not just about what feels yummy in the moment.
… and then jettison the memetic egregore and pay attention to your and others' actual Values. Don’t make the opposite mistake of motivatedly looking for clever reasons to not jettison the egregore just because it’s scary.
You can quick-check this in individual cases by replacing the defined word with some made-up word wherever the person uses it - e.g. replace “Goodness” with “Bixness”.
… actually when I first try to imagine that I get a mild “ugh” because I’ve tried and failed to make such a thing before. But when I set that aside and actually imagine the end product, then I get the yummy feeling.