I'm worried that many AI alignment researchers and other LWers have a view of how human morality works, that really only applies to a small fraction of all humans (notably moral philosophers and themselves). In this view, people know or at least suspect that they are confused about morality, and are eager or willing to apply reason and deliberation to find out what their real values are, or to correct their moral beliefs. Here's an example of someone who fits this view:
I’ve written, in the past, about a “ghost” version of myself — that is, one that can float free from my body; which travel anywhere in all space and time, with unlimited time, energy, and patience; and which can also make changes to different variables, and play forward/rewind different counterfactual timelines (the ghost’s activity somehow doesn’t have any moral significance).
I sometimes treat such a ghost kind of like an idealized self. It can see much that I cannot. It can see directly what a small part of the world I truly am; what my actions truly mean. The lives of others are real and vivid for it, even when hazy and out of mind for me. I trust such a perspective a lot. If the ghost would say “don’t,” I’d be inclined to listen.
I'm currently reading The Status Game by Will Storr (highly recommended BTW), and found in it the following description of how morality works in most people, which matches my own understanding of history and my observations of humans around me:
The moral reality we live in is a virtue game. We use our displays of morality to manufacture status. It’s good that we do this. It’s functional. It’s why billionaires fund libraries, university scholarships and scientific endeavours; it’s why a study of 11,672 organ donations in the USA found only thirty-one were made anonymously. It’s why we feel good when we commit moral acts and thoughts privately and enjoy the approval of our imaginary audience. Virtue status is the bribe that nudges us into putting the interests of other people – principally our co-players – before our own.
We treat moral beliefs as if they’re universal and absolute: one study found people were more likely to believe God could change physical laws of the universe than he could moral ‘facts’. Such facts can seem to belong to the same category as objects in nature, as if they could be observed under microscopes or proven by mathematical formulae. If moral truth exists anywhere, it’s in our DNA: that ancient game-playing coding that evolved to nudge us into behaving co-operatively in hunter-gatherer groups. But these instructions – strive to appear virtuous; privilege your group over others – are few and vague and open to riotous differences in interpretation. All the rest is an act of shared imagination. It’s a dream we weave around a status game.
The dream shifts as we range across the continents. For the Malagasy people in Madagascar, it’s taboo to eat a blind hen, to dream about blood and to sleep facing westwards, as you’ll kick the sunrise. Adolescent boys of the Marind of South New Guinea are introduced to a culture of ‘institutionalised sodomy’ in which they sleep in the men’s house and absorb the sperm of their elders via anal copulation, making them stronger. Among the people of the Moose, teenage girls are abducted and forced to have sex with a married man, an act for which, writes psychologist Professor David Buss, ‘all concerned – including the girl – judge that her parents giving her to the man was a virtuous, generous act of gratitude’. As alien as these norms might seem, they’ll feel morally correct to most who play by them. They’re part of the dream of reality in which they exist, a dream that feels no less obvious and true to them than ours does to us.
Such ‘facts’ also change across time. We don’t have to travel back far to discover moral superstars holding moral views that would destroy them today. Feminist hero and birth control campaigner Marie Stopes, who was voted Woman of the Millennium by the readers of The Guardian and honoured on special Royal Mail stamps in 2008, was an anti-Semite and eugenicist who once wrote that ‘our race is weakened by an appallingly high percentage of unfit weaklings and diseased individuals’ and that ‘it is the urgent duty of the community to make parenthood impossible for those whose mental and physical conditions are such that there is well-nigh a certainty that their offspring must be physically and mentally tainted’. Meanwhile, Gandhi once explained his agitation against the British thusly: ‘Ours is one continual struggle against a degradation sought to be inflicted upon us by the Europeans, who desire to degrade us to the level of the raw Kaffir [black African] … whose sole ambition is to collect a certain number of cattle to buy a wife with and … pass his life in indolence and nakedness.’ Such statements seem obviously appalling. But there’s about as much sense in blaming Gandhi for not sharing our modern, Western views on race as there is in blaming the Vikings for not having Netflix. Moral ‘truths’ are acts of imagination. They’re ideas we play games with.
The dream feels so real. And yet it’s all conjured up by the game-making brain. The world around our bodies is chaotic, confusing and mostly unknowable. But the brain must make sense of it. It has to turn that blizzard of noise into a precise, colourful and detailed world it can predict and successfully interact with, such that it gets what it wants. When the brain discovers a game that seems to make sense of its felt reality and offer a pathway to rewards, it can embrace its rules and symbols with an ecstatic fervour. The noise is silenced! The chaos is tamed! We’ve found our story and the heroic role we’re going to play in it! We’ve learned the truth and the way – the meaning of life! It’s yams, it’s God, it’s money, it’s saving the world from evil big pHARMa. It’s not like a religious experience, it is a religious experience. It’s how the writer Arthur Koestler felt as a young man in 1931, joining the Communist Party:
‘To say that one had “seen the light” is a poor description of the mental rapture which only the convert knows (regardless of what faith he has been converted to). The new light seems to pour from all directions across the skull; the whole universe falls into pattern, like stray pieces of a jigsaw puzzle assembled by one magic stroke. There is now an answer to every question, doubts and conflicts are a matter of the tortured past – a past already remote, when one lived in dismal ignorance in the tasteless, colourless world of those who don’t know. Nothing henceforth can disturb the convert’s inner peace and serenity – except the occasional fear of losing faith again, losing thereby what alone makes life worth living, and falling back into the outer darkness, where there is wailing and gnashing of teeth.’
I hope this helps further explain why I think even solving (some versions of) the alignment problem probably won't be enough to ensure a future that's free from astronomical waste or astronomical suffering. A part of me is actually more scared of many futures in which "alignment is solved", than a future where biological life is simply wiped out by a paperclip maximizer.
> All the rest is an act of shared imagination. It’s a dream we weave around a status game.
> They’re part of the dream of reality in which they exist, a dream that feels no less obvious and true to them than ours does to us.
> Moral ‘truths’ are acts of imagination. They’re ideas we play games with.
IDK, I feel like you could say the same sentences truthfully about math, and if you "went with the overall vibe" of them, you might be confused and mistakenly think math was "arbitrary" or "meaningless", or doesn't have a determinate tendency, etc. Like, okay, if I say "one element of moral progress is increasing universalizability", and you say "that's just the thing your status cohort assigns high status", I'm like, well, sure, but that doesn't mean it doesn't also have other interesting properties, like being a tendency across many different peoples; like being correlated with the extent to which they're reflecting, sharing information, and building understanding; like resulting in reductionist-materialist local outcomes that have more of material local things that people otherwise generally seem to like (e.g. not being punched, having food, etc.); etc. It could be that morality has tendencies, but not without hormesis and mutually assured destrubtion and similar things that might be removed by aligned AI.
Attacking you? I said I don't want to be around you and don't want to invest in you. I said it with a touch of snark ("remind me").
> I clearly meant that it applies to everyone in different ways
Not clear to me. I don't think everyone "would sacrifice a lot" to "see the people [they] hate being harmed". I wouldn't. I think behaving that way is inadvisable for you and harmful to others, and will tend to make you a bad investment opportunity.
You sound like you're positing the existence of two type of people: type I people who have morality based on "reason" and type II people who have morality based on the "status game". In reality,
everyone'snearly everyone's morality is based on something like the status game (see also: 1 2 3). It's just that EAs and moral philosophers are playing the game in a tribe which awards status differently.The true intrinsic values of most people do place a weight on the happiness of other people (that's roughly what we call "empathy"), but this weight is very unequally distributed.
There are definitely thorny questions regarding the best way to aggregate the values of different people in TAI. But, I think that given a reasonable solution, a lower bound on the future is imagining that the AI will build a private utopia for every person, as isolated from the other "utopias" as that person wants it to be. Probably some people's "utopias" will not be great, viewed in utilitarian terms. But, I still prefer that over paperclips (by far). And, I suspect that most people do (even if they protest it in order to play the game).
Sure, I've said as much in recent comments, including this one. ETA: Related to this, I'm worried about AI disrupting "our" status game in an unpredictable and possibly dangerous way. E.g., what will happen when everyone uses AI advisors to help them play status games, including the status game of moral philosophy?
What do you mean by "true intrinsic values"? (I couldn't find any previous usage of this term by you.) How do you propose finding people's true intrinsic values?
These weights, if low enough relative to other "values", haven't prevented people from committing atrocities on each other in the name of morality.
... (read more)I want to add a little to my stance on utilitarianism. A utilitarian superintelligence would probably kill me and everyone I love, because we are made of atoms that could be used for minds that are more hedonic[1][2][3]. Given a choice between paperclips and utilitarianism, I would still choose utilitarianism. But, if there was a utilitarian TAI project along with a half-decent chance to do something better (by my lights), I would actively oppose the utilitarian project. From my perspective, such a project is essentially enemy combatants.
One way to avoid it is by modifying utilitarianism to only place weight on currently existing people. But this is already not that far from my cooperative bargaining proposal (although still inferior to it, IMO). ↩︎
Another way to avoid it is by postulating some very strong penalty on death (i.e. discontinuity of personality). But this is not trivial to do, especially without creating other problems. Moreover, from my perspective this kind of thing is hacks trying to work around the core issue, namely that I am not a utilitarian (along with the vast majority of people). ↩︎
A possible counterargument is, maybe the superhedonic future minds wou
First, you wrote "a part of me is actually more scared of many futures in which alignment is solved, than a future where biological life is simply wiped out by a paperclip maximizer." So, I tried to assuage this fear for a particular class of alignment solutions.
Second... Yes, for a utilitarian this doesn't mean "much". But, tbh, who cares? I am not a utilitarian. The vast majority of people are not utilitarians. Maybe even literally no one is an (honest, not self-deceiving) utilitarian. From my perspective, disappointing the imaginary utilitarian is (in itself) about as upsetting as disappointing the imaginary paperclip maximizer.
Third, what I actually want from multi-user alignment is a solution that (i) is acceptable to me personally (ii) is acceptable to the vast majority of people (at least if they think through it rationally and are arguing honestly and in good faith) (iii) is acceptable to key stakeholders (iv) as much as possible, doesn't leave any Pareto improvements on the table and (v) sufficiently Schelling-pointy to coordinate around. Here, "acceptable" means "a lot better than paperclips and not worth starting an AI race/war to get something better".
I'm not a utilitarian either, because I don't know what my values are or should be. But I do assign significant credence to the possibility that something in the vincinity of utilitarianism is the right values (for me, or period). Given my uncertainties, I want to arrange the current state of the world so that (to the extent possible), whatever I end up deciding my values are, through things like reason, deliberation, doing philosophy, the world will ultimately not turn out to be a huge disappointment according to those values. Unfortunately, your proposed solution isn't very reassuring to this kind of view.
It's quite possible that I (and people like me) are simply out of luck, and there's just no feasible way to do what we want to do, but it sounds like you think I shouldn't even want what I want, or at least t... (read more)
What if the humans ask the aligned AI to help them be more moral, and part of what they mean by "more moral" is having fewer doubts about their current moral beliefs? This is what a "status game" view of morality seems to predict, for the humans whose status games aren't based on "doing philosophy", which seems to be most of them.
Kind of frustrating that this high karma reply to a high karma comment on my post is based on a double misunderstanding/miscommunication:
It's not just that the self-reports didn't fit the story I was building, the self-reports didn't fit the revealed preferences. Whatever people say about their morality, I haven't seen anyone who behaves like a true utilitarian.
IMO, this is the source of all the gnashing of teeth about how much % of your salary you need to donate: the fundamental contradiction between the demands of utilitarianism and how much people are actually willing to pay for the status gain. Ofc many excuses were developed ("sure I still need to buy that coffee or those movie tickets, otherwise I won't be productive") but they don't sound like the most parsimonious explanation.
This is also the source of paradoxes in population ethics and its vicinity: those abstractions are just very remote from actual human minds, so there's no reason they should produce anything sane in edge cases. Their only true utility is as an approximate guideline for making group decisions, for sufficiently mundane scenarios. Once you get to issues with infinities it becomes clear utilitarianism is not even mathematically coherent, in general.
You're right that there is a lot of variation in human psychology. But it's also an accepted ... (read more)
Well, if the OP said something like "most people have 2 eyes but enlightened Buddhists have a third eye" and I responded with "in reality, everyone have 2 eyes" then, I think my meaning would be clear even though it's true that some people have 1 or 0 eyes (afaik maybe there is even a rare mutation that creates a real third eye). Not adding all possible qualifiers is not the same as "not even pretending that it's interested in making itself falsifiable".
Natural language is not math, it's inherently ambiguous and it's not realistically possible to always be precise without implicitly assuming anything about the reader's understanding of the context. That said, it seems like I wasn't sufficiently precise in this case, so I edited my comment. Thank you for the correction.
What does it have to do with "No True Scotsman"? NTS is when you redefine your categories to justify your claim. I don't think I did that anywhere.
First, I didn't say all the scale is explained by status games, I did mention empathy as well.
Second, that by itself sure doesn't mean much. Explaining all the evidence would require an article, or maybe a book (although I hoped the posts I linked explain some of it). My point here is that there is an enormous discrepancy between the reported morality and the revealed preferences, so believing self-reports is clearly a non-starter. How do you build an explanation not from self-reports is a different (long) story.
My conclusion from this is more like "successful politicians are not moral paragons". More generally, trying to find morally virtuous people by a popular vote is not going to produce great results, because the popularity plays much greater role than morality.
I googled for "woman of the year" to get more data points; found this list, containing: 2019 Greta Thunberg, 2016 Hillary Clinton, 2015 Angela Merkel, 2010 Nancy Pelosi, 2008 Michelle Obama, 1999 Madeleine Albright, 1990 Aung San Suu Kyi... clearly, being a politician dramatically increases your chances of winning. Looking at their behavior, Aung San Suu Kyi later organized a genocide.
The list also includes 2009 Malala Yousafzai, who as far as I know is an actual hero with no dark side. But that's kinda my point, that putting Malala Yousafzai on the same list as Greta Thunberg and Hillary Clinton just makes the list confusing. And if you had to choose one of them as the "woman of the millenium", I would expect most reader... (read more)
I think a lot of contemporary cultures back then would have found "kicking the sunrise" to be silly, because it was obviously impossible even given what they knew at the time, i.e., you can only kick something if you physically touch it with your foot, and nobody has ever even gotten close to touching the sun, and it's even more impossible while you're asleep.
Why did the Malagasy people have such a silly belief? Why do many people have very silly beliefs today? (Among the least politically risky ones to cite, someone I've known for years who otherwise is intelligent and successful, currently believes, or at least believed in the recent past, that 2/3 of everyone will die as a result of taking the CO... (read more)
I am also scared of futures where "alignment is solved" under the current prevailing usage of "human values."
Humans want things that we won't end up liking, and prefer things that we will regret getting relative to other options that we previously dispreferred. We are remarkably ignorant of what we will, in retrospect, end up having liked, even over short timescales. Over longer timescales, we learn to like new things that we couldn't have predicted a priori, meaning that even our earnest and thoughtfully-considered best guess of our preferences in advance will predictably be a mismatch for what we would have preferred in retrospect.
And this is not some kind of bug, this is centrally important to what it is to be a person; "growing up" requires a constant process of learning that you don't actually like certain things you used to like and now suddenly like new things. This truth ranges over all arenas of existence, from learning to like black coffee to realizing you want to have children.
I am personally partial to the idea of something like Coherent Extrapolated Volition. But it seems suspicious that I've never seen anybody on LW sketch out how a decision theory ought to beha... (read more)
Even if moralities vary from culture to culture based on the local status games, I would suggest that there is still some amount of consequentialist bedrock to why certain types of norms develop. In other words, cultural relativism is not unbounded.
Generally speaking, norms evolve over time, where any given norm at one point didn't yet exist if you go back far enough. What caused these norms to develop? I would say the selective pressures for norm development come from some combination of existing culturally-specific norms and narratives (such as the sunrise being an agent that could get hurt when kicked) along with more human-universal motivations (such as empathy + {wellbeing = good, suffering = bad} -> you are bad for kicking the sunrise -> don't sleep facing west) or other instrumentally-convergent goals (such as {power = good} + "semen grants power" -> institutionalized sodomy). At every step along the evolution of a moral norm, every change needs to be justifiable (in a consequentialist sense) to the members of the community who would adopt it. Moral progress is when the norms of society come to better resonate with both the accepted narratives of society (which may ... (read more)
I honestly have a difficult time understanding the people (such as your "AI alignment researchers and other LWers, Moral philosophers") who actually believe in Morality with a capital M. I believe they are misguided at best, potentially dangerous at worst.
I hadn't heard of the Status Game book you quote, but for a long time now it's seemed obvious to me that there is no objective true Morality, it's purely a cultural construct, and mostly a status game. Any deep reading of history, cultures, and religions, leads one to this conclusion.
Humans have complex values, and that is all.
We humans cooperate and compete to optimize the universe according to those values, as we always have, as our posthuman descendants will, even without fully understanding them.
To repost my comment from a couple of weeks back, which seems to say roughly the same thing, not as well:
... (read more)I'm not sure what you mean by 'astronomical waste or astronomical suffering'. Like, you are writing that everything forever is status games, ok, sure, but then you can't turn around and appeal to a universal concept of suffering/waste, right?
Whatever you are worried about is just like Gandhi worrying about being too concerned with cattle, plus x years, yeah? And even if you've lucked into a non status games morality such that you can perceive 'Genuine Waste' or what have you...surely by your own logic, we who are reading this are incapable of understanding, aside from in terms of status games.
So on the one hand you have values that are easily, trivially compatible, such as "I want to spend 1000 years climbing the mountains of Mars" or "I want to host blood-sports with my uncoerced friends with the holodeck safety on".
On the other hand you have insoluble, or at least apparently insoluble, conflicts: B wants to torture people, C wants there to be no torture anywhere at all. C wants to monitor everyone everywhere forever to check that they aren't torturing anyone or plotting to torture anyone, D wants privacy. E and F both want to be the best in ... (read more)
I'm leaning towards the more ambitious version of the project of AI alignment being about corrigible anti-goodharting, with the AI optimizing towards good trajectories within scope of relatively well-understood values, preventing overoptimized weird/controversial situations, even at the cost of astronomical waste. Absence of x-risks, including AI risks, is generally good. Within this environment, the civilization might be able to eventually work out more about values, expanding the scope of their definition and thus allowing stronger optimization. Here corrigibility is in part about continually picking up the values and their implied scope from the predictions of how they would've been worked out some time in the future.
I think this post makes an important point -- or rather, raises a very important question, with some vivid examples to get you started. On the other hand, I feel like it doesn't go further, and probably should have -- I wish it e.g. sketched a concrete scenario in which the future is dystopian not because we failed to make our AGIs "moral" but because we succeeded, or e.g. got a bit more formal and complemented the quotes with a toy model (inspired by the quotes) of how moral deliberation in a society might work, under post-AGI-alignment conditions, and ho... (read more)
If with "morality" you mean moral realism, then yes, I agree that it is scary.
I'm most scared by the apparent assumption that we have solved the human alignment problem.
Looking at history, I don't feel like our current situation of relative peace is very stable.
My impression is that "good" behavior is largely dependent on incentives, and so is the very definition of "good".
Perhaps markets are one of the more successful tools of creating aligned behaviour in humans, but even in that case it only seems to work if the powers of the market participants are balanced, which is not a luxury we have in alignment work.
You could read the status game argument the opposite way: Maybe status seeking causes moral beliefs without justifying them, in the same way that it can distort our factual beliefs about the world. If we can debunk moral beliefs by finding them to be only status-motivated, the status explanation can actually assist rational reflection on morality.
Also the quote from The Status Game conflates purely moral beliefs and factual beliefs in a way that IMO weakens its argument. It's not clear that many of the examples of crazy value systems would survive full logical and empirical information.
There is no unique eutopia.
Sentient beings that collaborate outcompete ones that don't (not considering here, inner competition in a singleton). Collaboration means that interests between beings are traded/compromised. Better collaboration methods have a higher chance to win. We see this over the course of history. This is a messy evolutionary process. But I think there is a chance that this process itself can be improved e.g. with FAI. Think of an interactive "AlphaValue" that does Monte-Carlo Tree Search over collaboration opportunities. It will not converge on a unique best CEV but result in one of many possible eutopias.
I don't follow the reasoning. How do you get from "most people's moral behaviour is explainable in terms of them 'playing' a status game" to "solving (some versions of) the alignment problem probably won't be enough to ensure a future that's free from astronomical waste or astronomical suffering"?
More details:
Regarding the quote from The Status Game: I have not read the book, so I'm not sure what the intended message is but this sounds like some sort of unwarranted pessimism about ppl's moral standing (something like a claim like "the vast majority of ppl ... (read more)
Great post, thanks! Widespread value pluralism a la 'well that's just, like, your opinion man' is now a feature of modern life. Here are a pair of responses from political philosophy which may be of some interest
(1) Rawls/Thin Liberal Approach. Whilst we may not be able to agree on what 'the good life' is, we can at least agree on a basic system which ensures all participants can pursue their own idea of the good life. So,(1) Protect a list of political liberties and freedoms and (2) degree of economic levelling. Beyond that, it is up to ... (read more)
Don't you need AI to go through the many millions of experiences that it might take to develop a good morality strategy?
I'm entranced by Jordan Peterson's descriptions, which seem to light up the evolutionary path of morality for humans. Shouldn't AI be set up to try to grind through the same progress?
What's truly scary is how much the beliefs and opinions of normal people make them seem like aliens to me.
I find the paragraph beginning with these two sentences, its examples misleading and unconvincing in the point about moral disagreement across time it tries to make:
I shall try to explain why, because such evidence seemed persuasive to me before I thought about it more; I made this account just for this comment after being a lurker for a while -- I have found your previous posts about moral uncertainty ... (read more)
It seems that our morality consists of two elements. First is bias, based on game theoretical environment of our ancestors. Humans developed complex feelings around activities that promoted inclusive genetic fitness and now we are intrinsically and authentically motivated to do them for their own sake.
There is also a limited capability for moral updates. That's what we use to resolve contradictions in our moral intuitions. And that's also what allow us to persuade ourselves that doing some status promoting thing is actually moral. One the one hand, t... (read more)
You may not be interested in mutually exclusive compression schemas, but mutually exclusive compression schemas are interested in you. One nice thing is that given that the schemas use an arbitrary key to handshake with there is hope that they can be convinced to all get on the same arbitrary key without loss of useful structure.