I'm worried that many AI alignment researchers and other LWers have a view of how human morality works, that really only applies to a small fraction of all humans (notably moral philosophers and themselves). In this view, people know or at least suspect that they are confused about morality, and are eager or willing to apply reason and deliberation to find out what their real values are, or to correct their moral beliefs. Here's an example of someone who fits this view:
I’ve written, in the past, about a “ghost” version of myself — that is, one that can float free from my body; which travel anywhere in all space and time, with unlimited time, energy, and patience; and which can also make changes to different variables, and play forward/rewind different counterfactual timelines (the ghost’s activity somehow doesn’t have any moral significance).
I sometimes treat such a ghost kind of like an idealized self. It can see much that I cannot. It can see directly what a small part of the world I truly am; what my actions truly mean. The lives of others are real and vivid for it, even when hazy and out of mind for me. I trust such a perspective a lot. If the ghost would say “don’t,” I’d be inclined to listen.
I'm currently reading The Status Game by Will Storr (highly recommended BTW), and found in it the following description of how morality works in most people, which matches my own understanding of history and my observations of humans around me:
The moral reality we live in is a virtue game. We use our displays of morality to manufacture status. It’s good that we do this. It’s functional. It’s why billionaires fund libraries, university scholarships and scientific endeavours; it’s why a study of 11,672 organ donations in the USA found only thirty-one were made anonymously. It’s why we feel good when we commit moral acts and thoughts privately and enjoy the approval of our imaginary audience. Virtue status is the bribe that nudges us into putting the interests of other people – principally our co-players – before our own.
We treat moral beliefs as if they’re universal and absolute: one study found people were more likely to believe God could change physical laws of the universe than he could moral ‘facts’. Such facts can seem to belong to the same category as objects in nature, as if they could be observed under microscopes or proven by mathematical formulae. If moral truth exists anywhere, it’s in our DNA: that ancient game-playing coding that evolved to nudge us into behaving co-operatively in hunter-gatherer groups. But these instructions – strive to appear virtuous; privilege your group over others – are few and vague and open to riotous differences in interpretation. All the rest is an act of shared imagination. It’s a dream we weave around a status game.
The dream shifts as we range across the continents. For the Malagasy people in Madagascar, it’s taboo to eat a blind hen, to dream about blood and to sleep facing westwards, as you’ll kick the sunrise. Adolescent boys of the Marind of South New Guinea are introduced to a culture of ‘institutionalised sodomy’ in which they sleep in the men’s house and absorb the sperm of their elders via anal copulation, making them stronger. Among the people of the Moose, teenage girls are abducted and forced to have sex with a married man, an act for which, writes psychologist Professor David Buss, ‘all concerned – including the girl – judge that her parents giving her to the man was a virtuous, generous act of gratitude’. As alien as these norms might seem, they’ll feel morally correct to most who play by them. They’re part of the dream of reality in which they exist, a dream that feels no less obvious and true to them than ours does to us.
Such ‘facts’ also change across time. We don’t have to travel back far to discover moral superstars holding moral views that would destroy them today. Feminist hero and birth control campaigner Marie Stopes, who was voted Woman of the Millennium by the readers of The Guardian and honoured on special Royal Mail stamps in 2008, was an anti-Semite and eugenicist who once wrote that ‘our race is weakened by an appallingly high percentage of unfit weaklings and diseased individuals’ and that ‘it is the urgent duty of the community to make parenthood impossible for those whose mental and physical conditions are such that there is well-nigh a certainty that their offspring must be physically and mentally tainted’. Meanwhile, Gandhi once explained his agitation against the British thusly: ‘Ours is one continual struggle against a degradation sought to be inflicted upon us by the Europeans, who desire to degrade us to the level of the raw Kaffir [black African] … whose sole ambition is to collect a certain number of cattle to buy a wife with and … pass his life in indolence and nakedness.’ Such statements seem obviously appalling. But there’s about as much sense in blaming Gandhi for not sharing our modern, Western views on race as there is in blaming the Vikings for not having Netflix. Moral ‘truths’ are acts of imagination. They’re ideas we play games with.
The dream feels so real. And yet it’s all conjured up by the game-making brain. The world around our bodies is chaotic, confusing and mostly unknowable. But the brain must make sense of it. It has to turn that blizzard of noise into a precise, colourful and detailed world it can predict and successfully interact with, such that it gets what it wants. When the brain discovers a game that seems to make sense of its felt reality and offer a pathway to rewards, it can embrace its rules and symbols with an ecstatic fervour. The noise is silenced! The chaos is tamed! We’ve found our story and the heroic role we’re going to play in it! We’ve learned the truth and the way – the meaning of life! It’s yams, it’s God, it’s money, it’s saving the world from evil big pHARMa. It’s not like a religious experience, it is a religious experience. It’s how the writer Arthur Koestler felt as a young man in 1931, joining the Communist Party:
‘To say that one had “seen the light” is a poor description of the mental rapture which only the convert knows (regardless of what faith he has been converted to). The new light seems to pour from all directions across the skull; the whole universe falls into pattern, like stray pieces of a jigsaw puzzle assembled by one magic stroke. There is now an answer to every question, doubts and conflicts are a matter of the tortured past – a past already remote, when one lived in dismal ignorance in the tasteless, colourless world of those who don’t know. Nothing henceforth can disturb the convert’s inner peace and serenity – except the occasional fear of losing faith again, losing thereby what alone makes life worth living, and falling back into the outer darkness, where there is wailing and gnashing of teeth.’
I hope this helps further explain why I think even solving (some versions of) the alignment problem probably won't be enough to ensure a future that's free from astronomical waste or astronomical suffering. A part of me is actually more scared of many futures in which "alignment is solved", than a future where biological life is simply wiped out by a paperclip maximizer.