(I say this all the time, but I think that [the thing you call “values”] is a closer match to the everyday usage of the word “desires” than the word “values”.)
I think we should distinguish three things: (A) societal norms that you have internalized, (B) societal norms that you have not internalized, (C) desires that you hold independent of [or even despite] societal norms.
For example:
Anyway, the OP says: “our shared concept of Goodness is comprised of whatever messages people spread about what other people should value. … which sure is a different thing from what people do value, when they introspect on what feels yummy.”
I think that’s kinda treating the dichotomy as (B) versus (C), while denying the existence of (A).
If that 12yo girl “introspects on what feels yummy”, her introspection will say “myself wearing a crop-top with giant sweatpants feels yummy”. This obviously has memetic origins but the girl is very deeply enthusiastic about it, and will be insulted if you tell her she only likes that because she’s copying memes.
By the way, this is unrelated to “feeling of deep loving connection”. The 12yo girl does not have a “feeling of deep loving connection” to the tiktok influencers, high schoolers, etc., who have planted the idea in her head that crop-tops and giant sweatpants look super chic and awesome. I think you’re wayyy overstating the importance of “feeling of deep loving connection” for the average person’s “values”, and correspondingly wayyy understating the importance of this kind of norm-following thing. I have a draft post with much more about the norm-following thing, should be out soon :)
Good points as usual! On a meta note, I thought when writing this "Steve will probably say something like he usually says, and I still haven't fully incorporated it into my models, hopefully I'll absorb some more this time".
Anyway, I don't think I want to deny the existence of (A). I want to say that "style X is cool" is a true part of the girl's values insofar as style X summons up yummy/yearning/completeness/etc feelings on its own, and is not a true part of her values insofar as the feelings involved are mostly social anxiety or a yearning to be liked. (The desire to be liked would then be a part of her values, insofar as the prospect of being liked is what actually triggers the yearning.)
I do want to say that stuff is a true part of one's values once it triggers those feelings, regardless of whether memes were involved in installing the values along the way. I want to distinguish that from the case where people "tie themselves in knots", trying to act like they value something or telling themselves that they value something when the feelings are not in fact there, because they've been told (or logically convinced themselves) they "should" value the thing.
Main answer: this post is aimed at a lower level than you are at, and I intentionally did not unpack some of the more advanced questions, because that would have involved long sections which lower-level readers would find either hard to follow or unmotivated.
That said, the way I'd think about your points is in Values Are Real Like Harry Potter and We Don't Know Our Own Values.
I've now read your linked posts, but can't derive from them how you would answer my questions. Do you want to take a direct shot at answering them? And also the following question/counter-argument?
Think about the consequences, what will actually happen down the line and how well your Values will actually be satisfied long-term, not just about what feels yummy in the moment.
Suppose I'm a sadist who derives a lot of pleasure/reward from torturing animals, but also my parents and everyone else in society taught me that torturing animals is wrong. According to your posts, this implies that my Values = "torturing animals has high value", and Goodness = "don't torturing animals", and I shouldn't follow Goodness unless it actually lets me better satisfy my values better long-term, in other words allows me to torture more animals in the long run. Am I understanding your ideas correctly?
(Edit: It looks like @Johannes C. Mayer made a similar point under one of your previous posts.)
Assuming I am understanding you correctly, this would be a controversial position to say the least, and counter to many people's intuitions or metaethical beliefs. I think metaethics is a hard problem, and I probably can't easily convince you that you're wrong. But maybe I can at least convince you that you shouldn't be as confident in these ideas as you appear to be, nor present them to "lower-level readers" without indicating how controversial / counterintuitive-to-many the implications of your ideas are.
From the top:
Are our Values the real-world things that trigger our feelings, or the feelings themselves? (If the latter, we'll be able to artificially trigger them at negligible cost and with no negative side effects, unlike today.)
Not quite either of those, but if we're speaking loosely then the real-world things that trigger our feelings. Definitely not the feelings themselves.
"We Don’t Get To Choose Our Own Values" will be false, so that part will be irrelevant. How does this affect your arguments/conclusions?
It's already false today for things like e.g. heroin; drugs already make it possible to overwrite our values if we so choose. I would reason about future opportunities to overwrite our values in much the same way I reason about heroin today (and in much the same way which I think most people reason about heroin today).
Even today, Goodness-as-memetic-egregore can (and have) heavily influence our Values, through the kind of mechanism described in Morality is Scary. (Think of the Communists who yearned for communism so much that they were willing to endure extreme hardship and even torture for it.) This seems like a crucial part of the picture that you didn't mention, and which complicates any effort to draw conclusions from it.
Yup, I totally buy that that happens, including in more ordinary day-to-day ways. At the point where a meme has integrated itself into the feeling-triggers directly, I'm willing to say "ok this meme has become a part of this person's actual values". As with heroin, this is a thing which one typically wants to avoid under one's current values, but once it's happened there's no particular reason to undo it (at least from the first-person perspective; obviously people try to overwrite others' values all the time).
My own perspective is that what you call Human Values and Goodness are both potential sources (along with others) of "My Real Values", which I'll only be able to really figure out after doing or learning a lot more philosophy (e.g., to figure out which ones I really want to, or should, keep or discard, or how to answer questions like the above). In the meantime, my main goals are to preserve/optimize my option values and ability to eventually do/learn such philosophy, and don't do anything that might turn out to be really bad according to "My Real Values" (like deny some strong short-term desire, or commit a potential moral atrocity), using something like Bostrom and Ord's Moral Parliament model for handling moral uncertainty.
At some point, somewhere in this process, one needs to figure out what counts as evidence about value, i.e. what crosses the is-ought gap. And I would be real damn paranoid about giving a memetic egregore de-facto write access to the "ought" side of the is-ought gap.
Suppose I'm a sadist who derives a lot of pleasure/reward from torturing animals, but also my parents and everyone else in society taught me that torturing animals is wrong. According to your posts, this implies that my Values = "torturing animals has high value", and Goodness = "don't torturing animals", and I shouldn't follow Goodness unless it actually lets me better satisfy my values better long-term, in other words allows me to torture more animals in the long run. Am I understanding your ideas correctly?
[...]
Assuming I am understanding you correctly, this would be a controversial position to say the least, and counter to many people's intuitions or metaethical beliefs.
I'd flag that there's still instrumental considerations, i.e. other people assign (a lot of) negative value to animals being tortured and I probably want to still be friends with those people so I might want to avoid the torture for practical reasons.
That said, steelmanning: in a world where basically all humans enjoyed torturing animals, yes, those alternate-humans should-according-to-their-own-values torture lots of animals. Obviously that is controversial, but also-obviously it's one of those things that's controversial mostly for stupid reasons (i.e. people really want to find some reason why their own values are the One True Universal Good), not for good reasons.
Main answer: this post is aimed at a lower level than you are at, and I intentionally did not unpack some of the more advanced questions
I wish there was some kind of disclaimer or hint near the beginning of the text that this is the case, so I would know to read it with this in mind (or skip it altogether as not written for me).
What would you want such a disclaimer or hint to look like?
(I am concerned that if a post says something like "this post is aimed at low-level people who don't yet have a coherent foundational understanding of goodness and values" then the set of people who actually continue reading will not be very well correlated with the set of people we'd like to have continue reading.)
Maybe something like "This post presents a simplified version of my ideas, intended as an introduction. For more details and advanced considerations, please see such and such posts."
I agree that the distinction is important. However, my view is that a lot of what you call "goodness" is part of society's mechanism to ensure cooperate/cooperate. It helps other people get yummy stuff, not just you.
You can of course free yourself from that mechanism, and explicitly strategize how to get the most "yumminess" for yourself without ending up broke/addicted/imprisoned/etc. If the rest of society still follows "goodness", that leads to defect/cooperate, and indeed you end up better off. But there's a flaw in this plan.
Part of the point I intended to convey with the post is that society pushing for cooperate/cooperate is one way that Goodness-claims can go memetic, but there are multiple others ways memeticity can be achieved which are not so well aligned with the Values of Humans (either one's own values or others'). Thus this part:
Albert has relatively low innate empathy, and throws out all the Goodness stuff about following the rules and spirit of high-trust communities. Albert just generally hits the “defect” button whenever it’s convenient. Then Albert goes all pikachu surprise face when he’s excluded from high trust communities.
The message is definitely not to go hammering the defect button all the time, that's stupid. Yet somehow every time someone suggests that Goodness is maybe not all it's cracked up to be, lots of onlookers immediately round this to "you should go around hammering the defect button all the time!" (some with positive affect, some with negative) and man I really wish people could stop rounding that off and absorb the actual point.
Hmm. In all your examples, Albert goes against "goodness" and ends up with less "yumminess" as a result. But my point was about a different kind of situation: some hypothetical Albert goes against "goodness" and actually ends up with more "yumminess", but someone else ends up with less. What do you think about such situations?
I would ask Albert: do you generally find it yummy when other people get more yumminess? Do you usually feel like shit when you screw over someone else? For most people, the answers to these are "yes". Most people do not actually like screwing over other people, most of the time (though there are of course exceptions).
Insofar Albert is a sociopath, or is in one of those moods where he really does want to screw over someone else... I would usually say "Look man, I want you to pursue your best life and fulfill your values, so I wish you luck. But also I'm going to try to stop you, because I want the same for other people too, and I want higher-order nice things like high trust communities.". One does not argue against the utility function, as the saying goes.
Most people do not actually like screwing over other people
I think this is very culturally dependent. For example, wars of conquest were considered glorious in most places and times, and that's pretty much the ultimate form of screwing over other people. Or for another example, the first orphanages were built by early Christians, before that the orphans were usually disposed of. Or recall how common slavery and serfdom have been throughout history.
Basically my view is that human nature without indoctrination into "goodness" is quite nasty by default. Empathy is indeed a feeling we have, and we can feel it deeply (...sometimes). But we ended up with this feeling mainly due to indoctrination into "goodness" over generations. We wouldn't have nearly as much empathy if that indoctrination hadn't happened, and it probably wouldn't stay long term if that indoctrination went away.
I do want to say that stuff is a true part of one's values once it triggers the feelings of yumminess/yearning/etc, regardless of whether memes were involved in installing the values along the way. I want to distinguish that from the case where people "tie themselves in knots", trying to act like they value something or telling themselves that they value something when the feelings are not in fact there, because they've been told they "should" value the thing.
So yeah, some of our actual values are installed culturally/memetically, and that doesn't automatically make them bad or fake values. I'm on board with that, so long as the underlying feelings of yumminess/yearning/etc actually show up.
We can throw out the other junk of memetic egregore Goodness, without abandoning the stuff people actually feel good about.
But why do you think that people's feelings of "yumminess" track the reality of whether an action is cooperate/cooperate? I've explained that it hasn't been true throughout most of history: people have been able to feel "yummy" about very defecting actions. Maybe today the two coincide unusually well, but then that demands an explanation.
I think it's just not true. There are too many ways to defect and end up better off, and people are too good at rationalizing why it's ok for them specifically to take one of those ways. That's why we need an evolving mechanism of social indoctrination, "goodness", to make people choose the cooperative action even when it doesn't feel "yummy" to them in the moment.
But why do you think that people's feelings of "yumminess" track the reality of whether an action is cooperate/cooperate?
I don't think that's the right question here?
Let me turn it around: you say "That's why we need an evolving mechanism of social indoctrination, "goodness", to make people choose the cooperative action even when it doesn't feel "yummy" to them in the moment.". But, like, the memetic egregore "Goodness" clearly does not track that in a robust generalizable way, any more than people's feelings of yumminess do. The egregore is under lots of different selection pressures besides just "get people to not defect", and the egregore has indoctrinated people in different things over time. So why are you attached to the whole egregore, rather than wanting to jettison the bulk of the egregore and focus directly on getting people to not defect? Why do you think that the memetic egregore Goodness tracks the reality of whether an action is cooperate/cooperate?
But, like, the memetic egregore “Goodness” clearly does not track that in a robust generalizable way, any more than people’s feelings of yumminess do.
I feel you're overstating the "any more" part, or at least it doesn't match my experience. My feelings of "goodness" often track what would be good for other people, while my feelings of "yumminess" mostly track what would be good for me. Though of course there are exceptions to both.
So why are you attached to the whole egregore, rather than wanting to jettison the bulk of the egregore and focus directly on getting people to not defect?
This can be understood two ways. 1) A moral argument: "We shouldn't have so much extra stuff in the morality we're blasting in everyone's ears, it should focus more on the golden rule / unselfishness". That's fine, everyone can propose changes to morality, go for it. 2) "Everyone should stop listening to morality radio and follow their feels instead". Ok, but if nobody listens to the radio, by what mechanism do you get other people to not defect? Plenty of people are happy to defect by feels, I feel I've proved that sufficiently. Do you use police? Money? The radio was pretty useful for that actually, so I'm not with you on this.
Insofar Albert is a sociopath, or is in one of those moods where he really does want to screw over someone else... I would usually say "Look man, I want you to pursue your best life and fulfill your values, so I wish you luck. But also I'm going to try to stop you, because I want the same for other people too, and I want higher-order nice things like high trust communities.". One does not argue against the utility function, as the saying goes.
This seems incoherent to me? I'd like it if all the sociopaths are duped by society into not pursuing their values, that's great for my values, and because they're evil I'd rather them not pursue their best life. However I still support distinguishing between goodness and human values for the same general-purpose reasons why often, even if its possible in principle to use some piece of information for evil, its still often better to spread & talk about that information than not.
More generally I think people are too quick to use the phrase "One does not argue against the utility function, as the saying goes." Yes, you can't argue against the utility function, but if someone has a bad utility function and is unaware what that utility function is, I'm not going to dissuade them from that (unless I think they'll be happy to cooperate with me on bettering both our goals if I do, but sociopaths are not known for such behavior). That's part of stopping them.
I'm quite confident my preferences are coherent here, it's one of the parts of my values I'm most familiar with.
There's both an instrumentalish and a terminalish component. The terminalish component is roughly a really strong preference to not try to mislead people about their own values; that in particular is just incredibly deeply wrong for me to do according to my own values. The instrumentalish component is... very similar to the thing where people are like "well we need to be a little hyperbolic or misleading or conceal our true intent in order to spread our political message successfully" and then over and over again that type of reasoning leads people to metaphorically smack themselves in the face, it's a massive own goal, it just does not work.
Indeed, you could make a very reasonable argument that the entire reason AI might be dangerous is because once it's able to automate away the entire economy, as an example, defection no longer has any cost and has massive benefits (at least conditional on no alignment in values).
The basic reason why you can't defect easily and gain massive amounts of utility from social systems is a combo of humans not being able to evade enforcement reliably, due to logistics issues, combined with people being able to reliably detect defection in small groups due to reputation/honor systems, and combined with the fact that humans as individuals are far, far less powerful even selfishly as individuals than as cooperators.
This of course breaks once AGI/ASI is invented, but John Wentworth's post doesn't need to apply to post-AGI/ASI worlds.
I think the confusion here is that "Goodness" means different things depending on whether you're a moral realist or anti-realist.
If you're a moral realist, Goodness is an objective quality that doesn't depend on your feelings/mental state. What is Good may or may not overlap with what you like/prefer/find yummy, but it doesn't have to.
If you're a moral anti-realist, either:
I think "Human Values" is a very poor phrase because:
Instead, people referring to "Human Values" obscure whether they are moral realists or anti-realists, which causes a lot of confusion when determining the implications and logical consistency of their views.
If you're a moral realist, you can just say "Goodness" instead of "Human Values".
I notice I am confused. If "Goodness is an objective quality that doesn't depend on your feelings/mental state", then why would the things humans actually value necessarily be the same as Goodness?
A common use of "Human Values" is in sentences like "we should align AI with Human Values" or "it would be good to maximize Human Values upon reflection", i.e. normative claims about how Human Values are good and should be achieved. However, if you're not a moral realist, there's no (or very little) reason to believe that humans, even if they reflect for a long time etc., will arrive on the same values. Most of the time if someone says "Human Values" they don't mean to include the values of Hitler or a serial killer. This makes the term confusing, because it can both be used descriptively and normatively, and the normative use is common enough to make it confusing when used as a purely descriptive term.
I agree that if you're a moral realist, it's useful to have a term for "preferences shared amongst most humans" as distinct from Goodness, but Human Values is a bad choice because:
I mostly agree with this, the part which feels off is
I’d like to say here “screw memetic egregores, follow the actual values of actual humans”
Humans already follow their actual Values[1], and will always do because their Values are the reason they do anything at all. They also construct narratives about themselves that involve Goodness, and sometimes deny the distinction between Goodness and Values altogether. This act of (self-)deception is in itself motivated by the Values, at least instrumentally.
I do have a version of the “screw memetic egregores” attitude, which is, stop self-deceiving. Because, deception distorts epistemics, and we cannot afford distorted epistemics right now. It's not necessarily correct advice for everyone, but I believe it's correct advice for everyone who is seriously trying to save the world, at least.
Another nuance is that, in addition to empathy and naive tit-for-tat, there is also acausal tit-for-tat. This further pushes the Value-recommended strategy in the direction of something Goodness-like (in certain respects), even though ofc it doesn't coincide with the Goodness of any particular culture in any particular historical period.
As Steven Byrnes wrote, "values" might be not the best term, but I will keep it here.
To some extent "goodness" is some ever moving negotiated set of norms of how one should behave.
I notice that when I use the word "good" (or envoke this consept using other words such as "should"), I don't use it to point to the existing norms, but as a bid for what I think these norms should be. This sometimes overlap with the existing norms and sometimes not.
E.g. I might say that it's good to allow lots of diffrent subcultures to co-exist. This is a vote for a norm where peopel who don't my subculture leave me and my firends alone, in exchange for us leaving them alone. This is not unrelated to me getting what is jummy to me, but it at least one step removed.
"Good" is the set of norms we use to coordinate cooperation. If most people don't like when you pick your nose in public, then it's good to make an effort not to do so, and similar for a lot of other values. Even if you don't care about the nose picking, you probably care about some other of the things "good" coordinates around. For most people it's probably worth supporing the package deal. But I also think you "should" use your voice to help imrove the notion of what is "good".
- Our Values are (roughly) the yumminess or yearning we feel when imagining something.
- Goodness is (roughly) whatever stuff the memes say one should value.
I do not think this matches my usage of the words "Human Values" or (especially) "Goodness" (nor of the usage of the rare intelligent people whose ethical judgement I trust). The concept of yumminess/yearning is relevant; the concept of popular assertions of what one oughts to yearn for is relevant. But I object to both of these rough definitions on the grounds that they miss many central aspects.
Concretely: consider a heroin addict, in a memetic environment that strongly disapproves of heroin usage. Because of their addiction, by far the greatest yumminess they feel when imagining things is more heroin (and things which may have brought their past-self feelings of yumminess no longer have that feeling, because it cannot compete). In your framework, getting more heroin is part of their Values, but not part of their culture's Goodness.
So far so good — but now compare to your example of a gay man in a memetic environment that strongly disapproves of gay romance and sex. As far as I can tell, your analytic framework treats these cases exactly identically: it's a conflict between Values and Goodness, maybe with the man repeatedly tying himself up in knots to try and fail to crush his Values in the name of Goodness. But I claim this is wrong: an accurate account of Values and Goodness should be able to distinguish these two scenarios. (Lest you think I'm letting my own biases slip in: replace "gay romance and sex" with one of the sexual fetishes I personally disapprove of and think should be socially stigmatized. The distinction I'm getting at here is different.)
I challenge you to articulate the relevant difference between those two scenarios in your analytical framework. I claim any framework which can't is flinching away from a hard part of describing the type signatures and natures of Values and Goodness. This is the sense in which I meant that your rough definitions miss central concepts.
(Unless you assert that the two cases aren't different, in which case we might just have a more object-level disagreement, as opposed to you being wrong about your word usage.)
As for what central concepts your framework is missing — this deserves a longer response, but in lieu of that I will briefly gesture at one concept. There is the curious but well-known phenomenon whereby there is a difference between what a human wants (in the sense of revealed preference) and what he or she wants to want (in a particular complicated sense I'm only gesturing at). As you understand well, a man can have false beliefs about what he wants. For the same reason, he can have false beliefs about what he wants-to-want. (In particular, verbal description of what one wants-to-want are not identical to what one actually wants-to-want.)
I claim the self-hating socially-stigmatized heroin-addict has correct beliefs about what he wants-to-want, whereas the self-hating socially-stigmatized sexual-deviant has false beliefs thereof. This distinction is not one of yumminess-upon-imagining (each feels yummy upon imagining using heroin and having deviant sex), and it is not one of memetic pressure (each's behavior is disapproved of by society, and by me personally). But the distinction is central to ungderstanding Human Values and Goodness.
I would strongly agree with this critique, the characterization of goodness as memetics is severely under theorized, memeticity is a superficial aspect and there is a deeper structure worth considering.
Directionally correct advice for confused rationalist, but many of the specific claims are so imprecise or confused as to make many people more confused than enlightened.
Goodness is not an egregore. More sensible pointer would be something like Memetic values. Actually different egregores push for different values, often contradictory.
What happens on a more mechanistic level:
- when memes want people to do stuff, they can do two somewhat different things: 1) try to manipulate some existing part of implicit reward function 2) manipulate the world model
- often the path via 2) is easier; sometimes the hijack/rewrite is so blunt it's almost funny: for example there is a certain set of memes claiming you will get to mate with large number of virgin females with beautiful eyes if you serve the memeplex (caveat is you get this impressive boost to reproductive fitness only in the afterlife)
-- notice in this case basically no concept of goodness is needed / invoked, the structure rests on innate genetic evolutionary values, and change in world model
- another thing which the memes can try to do is also to replace some S1 model / feeling with a meme-based S2 version, such us the yumminess-predictor box with some explicit verbal model (you like helping people? give to GiveWell recommended charities)
-- this is often something done by rationalists and EAs
-- S2 Goodness is part of this, but non-central
Memetic values actually are important part of human values - at least my reflectively endorsed values. Large part of memetic values is human-aligned at the level of groups of humans (ie makes groups of humans function better, cooperate, trust each other, ...) or at the level of weird deals across time (ie your example other aspects of Goodness seem rather suspiciously optimized for getting kids to be easier for their parents and teachers to manage - think following rules or respecting one’s elders - could be a bargain: is if the kid is hard and expensive to manage and does not repsect the parent, and all of that would be known to the prospective parent, the parent could also decide to not bring the kid into existence).
Also The Yumminess You Feel is often of cultural evolutionary, ie, influenced by memetics. Humans are basically domesticated by cultural evolution; if you wonder whether selective evolutionary pressure can change someting like values or sense of yumminess, look at dogs. We are more domesticated than dogs. The selection pressures over many generations are different than current culture, but if after reading the text, someone starts listening to their yumminess feel and believes he is now driven by Actual, Non-memetic Human values, they are deeply confused.
This post doesn't seem to provide reasons to have one's actions be determined by one's feelings of yumminess/yearning, or reasons to think that what one should do is in some sense ultimately specified/defined by one's feelings of yumminess/yearning, over e.g. what you call "Goodness"? I want to state an opposing position, admittedly also basically without argument: that it is right to have one's actions be determined by a whole mess of things together importantly including e.g. linguistic goodness-reasoning, object-level ethical principles stated in language or not really stated in language, meta-principles stated in language or not really stated in language, various feelings, laws, commitments to various (grand and small, shared and individual) projects, assigned duties, debate, democracy, moral advice, various other processes involving (and in particular "running on") other people, etc.. These things in their present state are of course quite poor determiners of action compared to what is possible, and they will need to be critiqued and improved — but I think it is right to improve them from basically "the standpoint they themselves create".[1]
The distinction you're trying to make also strikes me as bizarre given that in almost all people, feelings of yumminess/yearning are determined largely by all these other (at least naively, but imo genuinely and duly) value-carrying things anyway. Are you advocating for a return to following some more primitively determined yumminess/yearning? (If I imagine doing this myself, I imagine ending up with some completely primitively retarded thing as "My Values", and then I feel like saying "no I'm not going to be guided by this lmao — fuck these "My Values"".) Or maybe you aren't saying one should undo the yumminess/yearning-shaping done by all this other stuff in the past, but are still advising one to avoid any further shaping in the future? It'd surprise me if any philosophically serious person would really agree to abstain from e.g. using goodness-talk in this role going forward.
The distinction also strikes me as bizarre given that in ordinary action-determination, feelings of yumminess/yearning are often not directly applied to some low-level givens, but e.g. to principles stated in language, and so only becoming fully operational in conjunction with eg minimally something like internal partly-linguistic debate. So if one were to get rid of the role of goodness-talk in one's action-determination, even one's existing feelings of yumminess/yearning could no longer remotely be "fully themselves".
If you ask me "but how does the meaning of "I should X" ultimately get specified/defined", then: I don't particularly feel a need to ultimately reduce shoulds to some other thing at all, kinda along the lines of https://en.wikipedia.org/wiki/Tarski's_undefinability_theorem and https://en.wikipedia.org/wiki/G._E._Moore#Open-question_argument . ↩︎
I like the sharp distinction you draw between
“Our Values are (roughly) the yumminess or yearning…”
and
“Goodness is (roughly) whatever stuff the memes say one should value.”
but the post treats these as more separable than they actually are from the standpoint of how the brain acquires preferences.
You emphasize that
“we mostly don’t get to choose what triggers yumminess/yearning”
and that Goodness trying to overwrite that is “silly.” Yet a few paragraphs later you note that
“a nontrivial chunk of the memetic egregore Goodness needs to be complied with…”
before recommending to “jettison the memetic egregore” once the safety-function parts are removed.
But the brain’s value-learning machinery doesn’t respect this separation. “Yumminess/yearning” is not fixed hardware; it’s a constantly updated reward model trained by social feedback, imitation, and narrative framing. The very things you group under “Goodness” supply the majority of training data for what later becomes “actual Values.” The egregore is not only a coordination layer or a memetically selected structure on top, it is also the training signal.
Your own example shows this coupling. You say that
“Loving Connection… is a REALLY big chunk of their Values”
while also being a core part of Goodness. This dual function of a learned reward target and the memetic structure that teaches people to want it, is typical rather than exceptional.
So the key point isn't “should you follow Goodness or your Values?” but “which training signals should you expose your value-learning architecture to?” Then the Albert failure mode looks less like “he ignored Goodness” and more like “he removed a large portion of what shapes his future reward landscape.”
And for societies, given that values are learned, the question becomes which parts of Goodness should we deliberately keep because they stabilize or improve the learning process, not merely because they protect cooperation equilibria?
I think there may be a fairly critical confusion here, but perhaps have missed the key bit (or perhaps by seeing this particular tree have missed the forest the post is aiming at) that would address that. It seems that in "human values" here are defined very much in terms of a specific human. However, "goodness" seems to be more about something larger -- society, the culture, humanity as a class or even living things.
I suspect a lot of the potential error in treating the terms as near to one another disappears if you think of goodness for a specific person or thinking of human values in terms of human as a group that holds common values. (Granted, in this latter case get to specific values will be problematic but in terms of pure logic or abstract reasoning I don't think the issues are nearly as bad as implied in the OP.)
I mostly don't seem to have anything new to say in response to this at the moment, but I figured mentioning my comment from a few weeks ago on hunches about origins of caring-for-others was in order, so there it is.
- Goodness is (roughly) whatever stuff the memes say one should value.
Looking at that first one, the second might seem kind of silly. After all, we mostly don’t get to choose what triggers yumminess or yearning.
A lot of goodness is about what you should do rather than what you should feel yearning for. There’s less conflict there. Even if you can’t change what you feel yearning for, you can change what you do.
One (over)optimistic hope I have is that something like a really good scale-free theory of intelligent agency would define a way to construct a notion of goodness that was actually aligned with the values of the members of a society to the best extent possible.
Is there a distinction to be made between different kinds of social imperatives?
e.g. I think a lot of people might feel the mimetic egregore tells them they should try to look good more than it tells them to be humble, but they might still associate the latter with 'goodness' more because when they are told to do it it is in the context of morality or virtue.
I agree there is an important distinction, but I think the social memetic aspect of "Goodness" is not central. The central distinction is that we have access to yumminess directly, it is the only thing we "truly care about" in some sense, but as bounded and not even perfectly coherent agents, we're unable to roll our predictions forward over all possible action paths and maximize yumminess.
Instead we need to form a compact /abstracted representation of our values/yuminess to 1) make them legible to ourselves and 2) make plans to attain them 3) communicate them 4) make them more coherent
I update my moral values based on my ontology. I try to factor in epistmic uncertainty. I do not attribute goodness to human values, because I do not center my world view around humans only. What an odd thing to do.
Ethics to me is an epistemic project. I read literature, poetry, the Upanishads, the Gita, the Gospels, Meditations, the sequences... More obscure things. I think and I update.
An awful lot of people, probably a majority of the population, sure do feel deep yearning to either inflict or receive pain, to take total control over another or give total control to another, to take or be taken by force, to abandon propriety and just be a total slut, to give or receive humiliation, etc.
This is rather tangential to the main thrust of the post, but a couple of people used a react to request a citation for this claim.
One noteworthy source is Aella's surveys on fetish popularity and tabooness. Here is an older one that gives the % of people reporting interest, and here is a newer one showing the average amount of reported interest on a scale from 0 (none) to 5 (extreme), both with tens of thousands of respondents.
Very approximate numbers that I'm informally reading off the graphs:
Note that a 3/5 average interest could mean either that 60% of people are extremely into it or that nearly everyone is moderately into it (or anything in between). Which seems to imply the survey used in the more recent graph has significantly kinkier answers overall, unless I'm misunderstanding something. (I'm fairly certain that people with zero interest ARE being included in the average, because several other fetishes have average interest below 1, which should be impossible if not.)
If we believe this data, it seems pretty safe to guess that a majority of people are into at least one of these things (unless there is near-total overlap between them). The claim that a majority "feel a deep yearning" is not strongly supported but seems plausible.
(I was previously aware that BDSM interest was pretty common for an extremely silly reason: I saw some people arguing about whether or not Eliezer Yudkowsky was secretly the author of The Erogamer, one of them cited the presence of BDSM in the story as evidence in favor, and I wanted to know the base rate to determine how to weigh that evidence.
I made an off-the-cuff guess of "between 1% and 10%" and then did a Google search with only mild hope that this statistic would be available. I wasn't able today to re-find the pages I found then, but according to my recollection, my first search result was a page describing a survey of ~1k people claiming a ~75% rate of interest in BDSM, and my second search result was a page describing a survey of ~10k people claiming ~40% had participated in some form of BDSM and an additional ~40% were interested in trying it. I was also surprised to read (on the second page) that submission was more popular than dominance, masochism was more popular than sadism, and masochism remained more popular than sadism even if you only looked at males. Also, bisexuality was reportedly something like 5x higher within the BDSM-interested group than outside of it.)
Curated. While in my personal language, I would have treated Goodness as a synonym for Human Values[1], the distinction John is making here is correct, plus his advice on how to approach it. A very important point I have noticed is that when people ask (or anguish), "am I a good person?" this is asking according to the social egregore sense of good – am I good in the way that will be approved by others? Social, despite seeming like a morality thing. By extension, I wonder how much scrupolisity, as an anxiety disorder, is a social anxiety disorder.
I'd guess that the social egregore of Goodness also gets muddled in how it mixes "here are things you do to be a good member of society" and "here are things that are good because they're personally prudent and or make you attractive to affiliate with for others", e.g. it's good to exercise and save money.
And specifically my values because it's an open question to me how broad my values are shared, cf. The Psychological Unity of Humankind
the distinction John is making here is correct, plus his advice on how to approach it
Really? Did you see this comment of mine? Do you endorse John's reply to it (specifically the part about the sadist)?
I didn't see your comment and the thread there, but yes. There is refinement and precision that could be added, whether the feelings vs the generator, etc, etc., but still that there's something more inherent to you vs something lives outside of you and is more social, that point is correct.
Regarding the sadists, yes, I think the values of the sadist might well be torture and from their perspective, they should be optimizing for that. If my values are anti-sadism (and I think they are), then we are at odds and maybe we fight. I don't think the structure of values prohibits people from having values different from my own. Strongly feel John's "people object to this for dumb reasons" stance.
Have you also seen https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics which was also partly in response to that thread? BTW why is my post still in "personal blog"?
"We don’t really know what human values are"
But we might, or might begin to: I put the effor tin over here :: Alignment ⑥ Values are an effort not a coin https://whyweshould.substack.com/p/alignment-values-are-an-effort-not
or in derived format: If all values are an effort, prices are a meeting of efforts https://whyweshould.substack.com/p/if-all-values-are-an-effort-prices
even deontological positions are an effort, evolution cares about the effort, not the ideal forms
There is a temptation to simply define Goodness as Human Values, or vice versa.
Alas, we do not get to choose the definitions of commonly used words; our attempted definitions will simply be wrong. Unless we stick to mathematics, we will end up sneaking in intuitions which do not follow from our so-called definitions, and thereby mislead ourselves. People who claim that they use some standard word or phrase according to their own definition are, in nearly all cases outside of mathematics, wrong about their own usage patterns.[1]
If we want to know what words mean, we need to look at e.g. how they’re used and where the concepts come from and what mental pictures they summon. And when we look at those things for Goodness and Human Values… they don’t match. And I don’t mean that we shouldn’t pursue Human Values; I mean that the stuff people usually refer to as Goodness is a coherent thing which does not match the actual values of actual humans all that well.
There’s this mental picture where a mind has some sort of goals inside it, stuff it wants, stuff it values, stuff which from-the-inside feels worth doing things for. In old-school AI we’d usually represent that stuff as a utility function, but we wanted some terminology for a more general kind of “values” which doesn’t commit so hard to the mathematical framework (and often-confused conceptual baggage outside the math) of utility functions. The phrase “human values” caught on.
We don’t really know what human values are, or what shape they are, or even whether they’re A Thing at all. We don’t have trivial introspective access to our own values; sometimes we think we value a thing a lot, but realize in hindsight that we value it only a little. But insofar as the mental picture is pointing to a real thing at all, it does tell us how to go look for our values within our own minds.
How do we go look for our own values?
Well, we’re looking for some sort of goals, stuff which our minds want or value, stuff which drives us, etc. What does that feel like from the inside? Think of the stuff that, when you imagine it, feels really yummy. It induces yearning and longing. It feels like you’d be more complete with it. That’s the feeling of stuff that you value a lot. Lesser versions of the same feeling come when imagining things you value less (but still positively).
Personally… I get that feeling of yumminess and yearning when I imagine having a principled mathematical framework for understanding the internal structures of minds, which actually works on e.g. image generators.[2] I also get that feeling of yumminess and yearning when I imagine a really great night of dancing, or particularly great sex, or physically fighting with friends, or my favorite immersive theater shows, or some of my favorite foods at specific restaurants. Sometimes I get a weaker version of the yumminess and yearning feeling when I imagine hanging out around a fire with friends, or just sitting out on my balcony alone at night and watching the city, or dealing with the sort of emergency which is important enough that I drop everything else from my mind and just focus
Those are my values. That’s what human values look like, and how to probe for yours.
I did not first learn about goodness by imagining things and checking how yummy they felt. I first learned about Goodness by my parents and teachers and religious figures and books and movies and so forth telling me that it’s Good to not steal things, Good to do unto others what I’d have them do unto me, Good to follow rules and authority figures, Good to clean up after myself, Good to share things with other kids, Good to not pick my nose, etc, etc.
In other words, I learned about Goodness mostly memetically, absorbing messages from others about what’s Good.
Some of those messages systematically follow from some general principles. Things like “don’t steal” are social rules which help build a high-trust society, making it easier for everyone to get what they want insofar as everyone else follows the rules. We want other people to follow those rules, so we teach other people the rules. Other aspects of Goodness, especially about cleanliness, seem to mostly follow humans’ purity instincts, and are memetically spread mainly by people with relatively-strong purity instincts in an attempt to get people with relatively-weaker purity instincts to be less gross (think nose picking). Still other aspects of Goodness seem rather suspiciously optimized for getting kids to be easier for their parents and teachers to manage - think following rules or respecting one’s elders. Then there are aspects of Goodness which seem to be largely political, driven by the usual political memetic forces.
The main unifying theme here is that Goodness is a memetic egregore; in practice, our shared concept of Goodness is comprised of whatever messages people spread about what other people should value.
… which sure is a different thing from what people do value, when they introspect on what feels yummy.
One thing to flag at this point: you know the feeling of deep loving connection, like a parent-child bond or spousal bond or the feeling you get (to some degree) when deeply empathizing with someone or the feeling of loving connection to God or the universe which people sometimes get from religious experiences? I.e. oxytocin?
For many (most?) people, that feeling is a REALLY big chunk of their Values. It is the thing which feels yummiest, often by such a large margin that it overwhelms everything else. If that’s you, then it’s probably worth stopping to notice that there are other things you value. It is quite possible to hyperoptimize for that one particular yumminess, then burn out and later realize that one values other things too - as many a parent learns when the midlife crisis hits.
That feeling of deep loving connection is also a major component of the memetic egregore Goodness, to such an extent that people often say that Goodness just is that kind of love. Think of the songs or hippies or whoever saying that all the world’s problems would be solved if only we had more love. As with values, it is worth stopping to notice that loving connection is not the entirety of Goodness, as the term is typically used. The people saying that Goodness just is loving connection (or something along those lines) are making the same move as someone trying to define a word; in most cases their usage probably doesn’t even match their own definition on closer inspection.
It is true that deep loving connection is both an especially large chunk of Human Values and an especially large chunk of Goodness, and within that overlap Human Values and Goodness do match. But that’s not the entirety of either Human Values or Goodness, and losing track of the rest is a good way to shoot oneself in the foot eventually.
To summarize so far:
Looking at that first one, the second might seem kind of silly. After all, we mostly don’t get to choose what triggers yumminess or yearning. There are some loopholes - e.g. sometimes we can learn to like things, or intentionally build new associations - but mostly the yumminess is not within conscious control. So it’s kind of silly for the memetic egregore to tell us what we should find yummy.
A central example: gay men mostly don’t seem to have much control over their attraction to men; that yumminess is not under their control. In many times and places the memetic egregore Goodness said that men shouldn’t be sexually attracted to men (those darn purity instincts!), which… usually isn’t all that effective at changing the underlying yumminess or yearning.
What does often happen, when the memetic egregore Goodness dictates something in conflict with actual Humans’ actual Values, is that the humans “tie themselves in knots” internally. The gay man’s attraction to men is still there, but maybe that attraction also triggers a feeling of shame or social anxiety or something. Or maybe the guy just hides his feelings, and then feels alone and stressed because he doesn’t feel safe being open with other people.
Sex and especially BDSM is a ripe area for this sort of thing. An awful lot of people, probably a majority of the population, sure do feel deep yearning to either inflict or receive pain, to take total control over another or give total control to another, to take or be taken by force, to abandon propriety and just be a total slut, to give or receive humiliation, etc. And man, the memetic egregore Goodness sure does not generally approve of those things. And then people tie themselves in knots, with the things that turn them on most also triggering anxiety or insecurity.
I’d like to say here “screw memetic egregores, follow the actual values of actual humans”, but then many people will be complete fucking idiots about it. So first let’s go over what not to do.
There’s a certain type of person… let’s call him Albert. Albert realizes that Goodness is a memetic egregore, and that the memetic egregore is not particularly well aligned with Albert’s own values. And so Albert throws out all that Goodness crap, and just queries his own feelings of yumminess in-the-moment when making decisions.
This goes badly in a few different ways
Sometimes Albert has relatively low innate empathy, and throws out all the Goodness stuff about following the rules and spirit of high-trust communities. Albert just generally hits the “defect” button whenever it’s convenient. Then Albert goes all pikachu surprise face when he’s excluded from high trust communities.
Other times Albert is just bad at thinking far into the future, and jumps on whatever feels yummy in-the-moment without really thinking ahead. A few years down the line Albert is broke.
Or maybe Albert rejects memetic Goodness, ignores authority a little too much, and winds up unemployed or in prison. Or ignores purity instincts a little too much and winds up very sick.
Point is: there’s a Chesterton’s fence here. Don’t be an idiot. Goodness is not very well aligned with actual Humans’ actual Values, but it has been memetically selected for a long time and you probably shouldn’t just jettison the whole thing without checking the pieces for usefulness. In particular, a nontrivial chunk of the memetic egregore Goodness needs to be complied with in order to satisfy your actual Values long term (which usually involves other people), even when it conflicts with your Values short term. Think about the consequences, what will actually happen down the line and how well your Values will actually be satisfied long-term, not just about what feels yummy in the moment.
… and then jettison the memetic egregore and pay attention to your and others' actual Values. Don’t make the opposite mistake of motivatedly looking for clever reasons to not jettison the egregore just because it’s scary.
You can quick-check this in individual cases by replacing the defined word with some made-up word wherever the person uses it - e.g. replace “Goodness” with “Bixness”.
… actually when I first try to imagine that I get a mild “ugh” because I’ve tried and failed to make such a thing before. But when I set that aside and actually imagine the end product, then I get the yummy feeling.