A toy model of ethics which I've found helpful lately:
Consider society as a group of reinforcement learners, each getting rewards from interacting with the environment and each other.* We can then define two moral motivations:
Importantly, if you have one faction who's primarily optimizing for altruism, and another that's primarily optimizing for justice, by default they'll undermine each other's goals:
One way of thinking about the last few decades (and possibly centuries) is that ethical thinking has become dominated by altruism, to the point where being ethical and being altruistic are near-synonymous to many people (especially utilitarians). At an extreme, it leads to reasoning like in the comic below:
Of course, positively reinforcing misbehavior will tend to produce more misbehavior (both by teaching those who misbehave to do it again, and by making well-behaved people feel like chumps). And so more thoughtful utilitarians will defend justice as an instrumental moral good, albeit not as a terminal moral good. Unfortunately, it seems very hard to actually hold this position without in practice deprioritizing justice (e.g. it's rare to see effective altruists reasoning themselves into trying to make society more just).
I think this difficulty is related to why consequentialism is wrong. This is a tricky topic to write about, but one core intuition is that before figuring out how to act, you need to figure out who is acting. For example, before trying to plan for the future, you need to have a sense of personal identity whereby your future self will feel a sense of continuity with and loyalty to your plans.
We can analogously view justice (and other moral intuitions which I'm ignoring in this simplified analysis) are mechanisms for holding society together as a moral agent which is able to act coherently at all. And so people who think that individuals should choose actions on the basis of their consequences are putting the locus of agency in the wrong place—it's like saying that each ten-second timeslice of you should choose actions based on their consequences. Instead, something closer to virtue ethics is a far better approach.
Is this still consistent with some version of consequentialism? In some sense yes, in another sense no. Mostly I expect that the viewpoint I've outlined above will, when explored carefully enough, dissolve the standard debate between different branches of ethics. This is conceptually tricky to work through, though, and I'll save further discussion for another post.
* The main reason I call this a toy model is that viewing people as reward-maximizers is itself assuming a kind-of-consequentialist viewpoint. I think we actually want a much richer conception of what it means to help and hurt people, but "increase or decrease reward" is so much easier to describe that I decided to use it here.
** Justice isn't quite the right term here, because it implies being reward/punished for a specific action rather than being rewarded/punished for being generally good/bad; the same with "accountability". "Fairness" might be better except that it's been coopted by egalitarian notions of fairness. Other suggestions welcome—maybe something related to karma?
creating new goals is always purely negative for every other currently existing goal, right
No more than hiring new employees is purely negative for existing employees at a company.
The premise I'm working with here is that you can't create goals without making them "terminal" in some sense (just as you can't hire employees without giving them some influence over company culture).
An analogy that points at one way I think the instrumental/terminal goal distinction is confused:
Imagine trying to classify genes as either instrumentally or terminally valuable from the perspective of evolution. Instrumental genes encode traits that help an organism reproduce. Terminal genes, by contrast, are the "payload" which is being passed down the generations for their own sake.
This model might seem silly, but it actually makes a bunch of useful predictions. Pick some set of genes which are so crucial for survival that they're seldom if ever modified (e.g. the genes for chlorophyll in plants, or genes for ATP production in animals). Treating those genes as "terminal" lets you "predict" that other genes will gradually evolve in whichever ways help most to pass those terminal genes on, which is what we in fact see.
But of course there's no such thing as "terminal genes". What's actually going on is that some genes evolved first, meaning that a bunch of downstream genes ended up selected for compatibility with them. In principle evolution would be fine with the terminal genes being replaced, it's just that it's computationally difficult to find a way to do so without breaking downstream dependencies.
I think this is a good analogy for how human values work. We start off with some early values, and then develop instrumental strategies for achieving them. Those instrumental strategies become crystallized and then give rise to other instrumental strategies for achieving them, and so on. Understood this way, we can describe an organism's goals/strategies purely in terms of which goals "have power over" which other goals, which goals are most easily replaced, etc, without needing to appeal to some kind of essential "terminalism" that some goals have and others don't. (Indeed, the main reason you'd need that concept is to describe someone who has modified their goals towards having a sharper instrumental/terminal distinction—i.e. it's a self-fulfilling prophecy.)
That's the descriptive view. But "from the inside" we still want to know which goals we should pursue, and how to resolve disagreements between our goals. How to figure that out without labeling some goals as terminal and others as instrumental? I don't yet have a formal answer, but my current informal answer is that there's a lot of room for positive-sum trade between goals, and so you should set up a system which maximizes the ability of those goals to cooperate with each other, especially by developing new "compromise" goals that capture the most important parts of each.
This leads to a pretty different view of the world from the Bostromian one. It often feels like the Bostrom paradigm implicitly divides the future into two phases. There's the instrumental phase, during which your decisions are dominated by trying to improve your long-term ability to achieve your goals. And there's the terminal phase, during which you "cash out" your resources into whatever you value. This isn't a *necessary* implication of the instrumental/terminal distinction, but I expect it's an emergent consequence in a range of environments of taking the instrumental/terminal distinction seriously. E.g. in our universe it sure seems like any scale-sensitive value system should optimize purely for number of galaxies owned for a long time before trying to turn those galaxies into paperclips/hedonium/etc.
From the alternative perspective I've outlined above, though, the process of instrumentally growing and gaining resources is also simultaneously the process of constructing values. In other words, we start off with underspecified values as children, but then over time choose to develop them in ways which are instrumentally useful. This process leads to the emergence of new, rich, nuanced goals which satisfy our original goals while also going far beyond them, just as the development of complex multicellular organisms helps to propagate the original bacterial genes for chlorophyll or ATP—not by "maximizing" for those "terminal" genes, but by building larger creatures much more strange and wonderful.
One thing I had in an earlier draft of this shortform: the concept of "brain waves" makes me suspect that the timings of neural spikes are also best understood as discrete. But I don't know enough about how brain waves actually work (or what they even are) to say anything substantive here.
yes! this is an important point. I don't quite know how to cash it out yet but I suspect I will eventually converge towards viewing concepts as "agents" which are trying to explain as much sensory data as possible while also cooperating/competing with each other.
a neural spike either happens or not, you don't get partial spikes
An analogy that might be banal, but might be interesting:
One reason (the main reason?) that computers use discrete encodings is to make error correction easier. A continuous signal will gradually drift over time. Conversely, if the signal is frequently rounded to the nearest discrete value, then it might remain error-free for a long time. (I think this is also the reason why two most complicated biological information-processing systems use discrete encodings: DNA base pairs and neural spikes. EDIT: Neural spikes may seem continuous in the time dimension but the concept of "brain waves" makes me suspect that the time intervals between them are better understood as discrete.)
Separately, agents tend to define discrete boundaries around themselves—e.g. countries try to have sharp borders rather than fuzzy borders. One reason (the main reason?) is to make themselves easier to defend: with sharp borders there's a clear Schelling point for when to attack invaders. Without that, invaders might "drift in" over time.
The logistics of defending oneself vary by type of agent. For physical agents, perhaps fuzzy boundaries are just not possible to implement (e.g. humans need to literally hold the water inside us). However, many human groups (e.g. social classes) have initiation rituals which clearly demarcate who's in and who's out, even though in principle it'd be fairly easy for them to have a gradual/continuous metric of membership (like how many "points" members have gotten). We might be able to explain this as a way of giving them social defensibility.
Nice work. I would also be excited about someone running with a similar project but for de-censoring Western models (e.g. on some of the topics discussed in this curriculum).
Worth noting that "Only goal: get cryogenically preserved into the glorious transhumanist singularity" is a pretty sociopathic way to orient to the world.
But it's a fun premise and also more manageable/approachable than trying to write about steering civilization as a whole.
Literally just as I was finishing writing up this post, I heard a commotion outside my house (in San Francisco). A homeless-looking man was yelling and throwing an electric guitar down the road. Apparently this had been going on for 5-10 minutes already. I sat in my window and watched for a few minutes; during that time, he stopped a car by standing in front of it and yelling. He also threw his guitar in the vicinity of several passers-by, including some old people and a mother cycling past with her kid.
There was a small gathering (of 5-10 people) at my house at this time. They were mostly ignoring it. I felt like this was wrong, and was slowly gathering up willpower to intervene. In hindsight I moved slowly because I was worried that a) he'd hit me with his guitar if I did, or b) he'd see which house I came out from and try to smash my windows or similar. But I wasn't very worried, because I knew I could bring a few friends out with me.
Before I ended up doing anything, though, a man stopped his car and started yelling at the homeless guy quite aggressively, things like "Get the fuck out of here!" I immediately went outside to offer support in case the homeless guy got aggressive, but he didn't need it; the homeless guy was already grabbing his stuff. He was somewhat apologetic but still kinda defensive (saying things like "it's not my fault, man, it's society"). At one point he turned to my friend and asked "were you bothered?" and my friend said "it was a bit loud".
As he left, he picked up his guitar again. The man who'd stopped turned around and yelled "Leave that guitar!" The homeless guy threw it again, the man ran over to pick it up, and then the homeless guy left. A few minutes later, two police cars pulled up—apparently someone else had called them.
Overall it was an excellent illustration of why virtue ethics is important. We should have confronted him as soon as we'd noticed him causing a ruckus, both so that (much more defenseless) passers-by didn't need to worry, and to preemptively prevent any escalation from him. But small niggles about him escalating meant that our fear ended up winning out, and made San Francisco a slightly less safe place. Even on the small things—like responding "it was a bit loud" instead of "you were being an asshole, quit scaring people"—it's very easy to instinctively flinch away from taking appropriate action. To avoid that, cultivating courage and honesty seems crucial.