In software engineering, "design patterns" has been used in a few different ways. One of them is something like "abstractions that are a little bit higher-level than the ones built into your programming language." Fans of some languages (Lisp, Haskell) looked at "patterns" in other languages (C++, Java) and said, "Why do you need to write a 'pattern' for that? It's just a {higher-order function, macro, monoid in the category of endofunctors, etc.}" — something built-in to the speaker's preferred language. (See Dominus, Norvig, C2 Wiki, and Graham's "Blub paradox".)
Analogously, there are "ethical design patterns" that are easily expressed by core features of other systems of morality. One example of the sort of thing I mean is (moral, legal, political) equality. (The sort that expands as "the same rules apply to everyone," not "everyone is assumed to have identical needs and abilities.") Some people say equality is a value itself; others might say equality is a tool for aligning people to achieve other values like peace or prosperity; others that some sort of equality is necessary for talking about morality to be meaningful at all (because a rule is only 'a rule' if it can apply to more than one person).
Regardless, equality describes a symmetry among rules. A claim like "Everyone has an equal right to own property, not a right to own equal property" is just saying that one symmetry exists and another doesn't. (Or, cynically, "the rich and the poor are equally forbidden from sleeping under bridges and stealing bread.")
(Other terms I think might work at higher levels of abstraction include "faith" and "grace". Faith points at something like "acting as if X is true, in a way that supports it turning out to be true" — like forgiveness in the iterated Prisoner's Dilemma, or loving your neighbor because God loves everyone and you are God's hands — and grace is when that actually works.)
To the person whose moral language is higher-level than consequentialism, consequentialists have gotta look a bit like C++ programmers look to the Lisp or Haskell hacker. Greenspun's Tenth Rule generalizes: "Any sufficiently developed consequentialist morality contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Kantianism." (Or maybe "...of virtue ethics." Or something else.)
I don't have the math to express it, but I think this points at consequentialism and something-like-Kantianism not being opposed, but rather both just true at different levels of abstraction. And a fun part of that, is that it makes "categorical imperative" into a math & CS pun.
I think this analogy fails to engage with how philosophically different consequentialist morality and non consequentialist morality.
Here is how i would describe moral frameworks in programming terms, from an admittedly consequentialist perspective. Here a single line of machine code will correspond to some simple moral claim (eg "John prefers outcome A over outcome B").
The programming languages are different consequentialist moralities. Each of which will have some terminology which compiles down to the machine code. Examples of such terms could be "the common good", "action A is better than action B in situation S", "outcome A is good", "person P is good", which each language can include or exclude, also notably two languages might implement the same thing in ways that compile fundamentally differently (compare how the basic list type in some languages is an array and in some languages is a linked list).
Then non consequentialist positions is like having programming languages that cant compile to machine code. The non consequentialist claims that the machine language is impoverished. This would be similar to criticizing a programming language as not Turing complete. And maybe thats a correct criticism, but it is a fundamentally deeper disagreement.
Maybe its worthwhile to compare with the question of reductionist vs non reductionist ontology.
Here the machine code is some simple empirical claim (eg "there is an electron at (x,y,z,w)").
The programming languages are languages that contain words that compile down to machine code (possibly in layers biology -> chemistry -> particle physics).
Then non reductionism claims that there are real objects (eg souls or numbers) which cannot be compiled like that, the machine code forms an impoverished ontology, that cant explain/compile everything that exists.
I have this saying I find pithy and fun, which may or may not reflect my actual morality: "The correct ethical system is virtue ethics, but it is not virtuous to ignore consequentialism".
Full agree. Hopefully consequentialism and kantianism are not opposed, in most cases. They can be seen as mirrored process, inductive vs deductive. My first thought when reading the blog post was like : isn't this idea of "ethical patterns" like "be kind" a fall back to deontologism as a safe mode ? But if predicting the optimal result is often intractable, the arbitration between deontologic principles is not easy either.
IMO, it’s hard to get a consensus for Heuristic C at the moment even though it kind of seems obvious.
Consider that humanity couldn't achieve a consensus around banning or not using cigarettes, leaded gasoline, or ozone-destroying chemicals, until they had done a huge amount of highly visible damage. There must have been plenty of arguments about their potential danger based on established science, and clear empirical evidence of the damage that they actually caused, far earlier, but such consensus still failed to form until much later, after catastrophic amounts of damage had already been caused. The consensus against drunk driving also only formed after extremely clear and undeniable evidence about its danger (based on accident statistics) became available.
I'm skeptical that more intentionally creating ethical design patterns could have helped such consensus form earlier in those cases, or in the case of AI x-safety, as it just doesn't seem to address the main root causes or bottlenecks for the lack of such consensus or governance failures, which IMO are things like:
Something that's more likely to work is "persuasion design patterns", like what helped many countries pass anti-GMO legislation despite lack of clear scientific evidence for their harm, but I think we're all loathe to use such tactics.
It seems heuristic C applies to cigarettes, leaded gas, and ozone-destroying chemicals. If we had already had heuristic C and sufficient ethical bridges around it, we would have been much more equipped to respond to those threats more quickly. Your points 1-3 do seem like valid difficulties for the promotion of heuristic C. They may be related to some of the heuristics D-I.
I agree we need effective persuasion and perhaps persuasion design patterns, but persuasion focused on promoting heuristic C to aid in promoting AI x-safety doesn't seem like wasted effort to me.
Okay, but: it's also find individuals who are willing to speak for heuristic C, in a way I suspect differs from what it was like for leaded gasoline and from what I remember as a kid in the late 80's about the ozone layer.
It's a fair point that I shouldn't expect "consensus", and should've written and conceptualized that part differently, but I think heuristic C is also colliding with competing ethical heuristics in ways the ozone situation didn't.
What you are saying about persuasion is important in both directions. Have you every encountered agnotology? It's part of sociology, and it looks at the creation of unknowing. In the cases you list above, including tobacco and lead, there has been research into the ways that industries marshalled money, resources, and persuasion to create doubt and prevent regulation.
So there's maybe an additional important heuristic, which is that profit will motivate individuals to ignore harm, and if they have power and money, they will use institutions to persuade people not to notice.
It's not the entirety of the difficulty, but it is something that ethics might help to correct.
I think a key feature of how we as humans choose heuristics is that we have a state of the world in mind that we want and we choose the heuristics we use to reach that state. It's one of the points of jimmy's sequence that I think is underread.
It's relatively easy to coherently imagine a world where most people aren't engaging in drunk driving and pick designated drivers. It's easy to imagine a world in which more and bigger buildings get build.
On the other hand, it's hard to imagine how 2040 would look while stopping the building of AGI. For me that makes “If something has a >10% chance of killing everyone according to most experts, we probably shouldn’t let companies build it.” a heuristics that feels more intellectual than embodied. I think to have it feel embodied, I would need to have a vision of how a future that the heuristic produces would look like.
As far as concrete imagination goes, I'm also not sure what "we" in that sentence means. Note that you don't have any unclear "we" in either the YIMBY or the Mothers Against Drunk Driving examples you describe.
"Successful heuristics are embodied" seems like a good ethical heuristic heuristic. I support the call to action to make "we shouldn't let companies cause minor risks of major harm" more embodied by giving examples of how a future where we have and use that heuristic. (Related, I think "we shouldn't let companies cause minor risks of major harms" is better phrasing for heuristic C.)
A good heuristic is one that tells you what to do. "Friends don’t let friends drive drunk" is a heuristic that tells you what you should do. If you are in a situation where a friend might engage in drunk driving, you do something to stop them.
"We should ..." is not a heuristic that tells you what to do. It's not embodied in that sense. It's largely a statement about what you think other people should do.
If I ask you whether you applied points Anna listed in the YIMBY or the Mothers Against Drunk Driving sections in the last week you can tell me "yes" or "no". Applying those is something you have the personal agency to do.
Am I understanding you correctly in that you are pointing out that people have spheres of influence with areas that seemingly have full control over and other places where they seemingly have no control? That makes sense and seems important. In places where you can aim your ethical heuristic where people have full control it will obviously be better, but unfortunately it is important for people to try to influence things that they don't seem to have any control over.
I suppose you could prescribe self referential heuristics, for example "have you spent 5 interrupted minutes thinking about how you can influence AI policy in the last week?" It isn't clear whether any given person can influence these companies, but it is clear that any given person can consider it for 5 minutes. That's not a bad idea, but there may be better ways to take the "We should..." statement out of intractability and make it embodied. Can you think of any?
My longer comment on ethical design patterns explores a bit about how I'm thinking about influence through my "OIS" lens in a way tangentially related to this.
If you look at the YIMBY example that Anna laid out, cities policies are not under direct control of citizens, yet Anna found some points that relate to what people can actually do.
If it seems like you don't have any control over something you want to change it makes sense to think of a theory of changes according to which you have control.
Right now, one issue seems to be that most people don't really have it as part of their world view that there's a good change of human extinction via AI. You could build a heuristic around, being open about the fact that there's a good chance of human extinction via AI with everyone you meet.
There are probably also many other heuristics you could think of about what people should do.
It's a good point, re: some of the gap being that it's hard to concretely visualize the world in which AGI isn't built. And also about the "we" being part of the lack of concreteness.
I suspect there're lots of kinds of ethical heuristics that're supposed to interweave, and that some are supposed to be more like "checksums" (indicators everyone can use in an embodied way to see whether there's a problem, even though they don't say how to address it if there is a problem), and others are supposed to be more concrete.
For some more traditional examples:
It would be too hard to try to equip humans and human groups for changing circumstances via only a "here's what you do in situation X". It's somewhat easier to do it (and traditional ethical heuristics did do it) by a combination of "you can probably do well by [various what-to-do heuristics]" and "you can tell if you're doing well by [various other checksum-type heuristics]. Ethics is help to let us design our way to better plans, not to only always give us those plans.
A key aspect of modern democracy with the rule of law is that companies can operate even if people believe they are acting with bad character. It's not hard to convince a majority that Elon Musk and Sam Altman are people with bad character but that's not sufficient to stopping them from building AGI.
As far as "should have freedom of speech and press" goes, both Republican and Democratic administrations over the last two decades did a lot to reduce those freedoms but the pushback comes mostly on partisan lines. They amount of people who take a principled stand on freedom of speech no matter whether it's speech by friends or foes is small.
As far as "should have a monarch who inherited legitimately" goes, I think it worked for a long time as a Schelling point around with people could coordinate and not because most people found the concept of being ruled by a king that great. It was a Schelling point that allowed peaceful transition of power after a king died where otherwise there would have been more conflict about succession.
Eg JK Rowling's character Sirius's claim that you can see the measure of a person by how they treat their house-elves
While we are at general principles, citing JK Rowling in a discussion on ethics is probably generally a bad idea for politics is the mind killer reasons. I think the article is very interesting in terms of cultural norms.
It gets frequently cited to make a point that discussing politics is inherently bad, which isn't something the article argues. On the other hand, the actual argument that if you use political examples it will make your audience focus on politics and make them less clear thinking when you could use non-political examples that don't have this problem is seldomly appreciated, because people like using their political examples.
A contemporary example of inadequate ethical heuristics: Public discussion of group differences
I think the word "public" is doing a lot of heavy lifting here. I can think sanely about group differences on my own, and it would be easy to have that conversation with you. I'd also expect to be able to handle it with individuals who are strongly attached to one heuristic or the other, and perhaps even in small groups. I wouldn't tweet about it. Nor would I strike up that conversation in a crowded room of strangers.
The problem isn't that I can't think sanely about such topics, but that they can't -- and because descriptively speaking "they" and "I" is a better descriptor than "we" in those cases. In smaller scales, thinking as "we" is easier. If I say something that my close friend would have disagreed with, they know me well enough to appropriately weight the fact that I'm saying it, and that immediately changes the joint perception as "we" in ways that tweeting does not. And when the friend doesn't buy it, the amount of "Hm, maybe I'm wrong" is very manageable, so engaging with the kind of humility that makes it "shared exploration towards truth" rather than "an attempt to manipulate" is easy. Even "bold" statements like "You're doing it all wrong [and I know this because there's nothing you've thought of that I haven't considered which could justify a change of mind]" are fairly achievable in this context because you do know your friend fairly well, and they know you know them fairly well, etc.
Try to step up in scale though, and it gets tougher. Either you have to back off a bit because maybe the larger group knows more things you don't, or you have to know more things including how more distant strangers (incorrectly) model things. As the group you're trying to move becomes larger and more powerful, the push back you invite becomes louder and more meaningful. Have you found the courage to pick a fight that big, and still humble yourself appropriately, if necessary? Because if not then you're setting yourself up to fold prematurely, before standing strong enough to evoke the kind of evidence that would genuinely change your mind.
An analogy that comes to mind is "injection locking", as demonstrated by groups of metronomes synching up. Couple yourself too tightly to a large coordinated mass, and you're likely to find your heart compelled to the same wavelength, even as you recognize that it's wrong (whoops, there goes your sanity). Decouple too much, and even if you're not missing anything from the larger group, you're not helping the group either. The trick is to regulate your coupling such that you can both influence and be influenced in ways that are tracking truth, without losing track of genuinely valuable wavelengths you've entrained to.
And if you try to try to "cheat" and preserve your frequency by pushing on the group without opening yourself to push back, that's a good definition of manipulation, and when the group notices it will backfire.
I think it's important that "we" think carefully about what group we can really "we" with, without losing lock on reality, and updating in ways that shut out information rather than incorporating more information. And now that I think of it, the problem of how to scale up seems to be missing an ethical design pattern itself. There's not a lot of good guidance at how quickly to try to integrate with larger groups of metronomes.
Jordan Peterson took a crack at it with "Clean your room"/"Set your house in perfect order before you criticize the world", but that's more of a counter heuristic than a bridge. And overly moralistic and unachievable. In totally unrelated news, "Jordan Peterson" has become a scissor statement.
In short form, I'd probably phrase it like "Be careful to match your ambition with humility and courage, and scale only as fast as you dare"
Interesting. I like the "metronomes syncing" metaphor. It evokes the same feeling for me as a cloud of chaotically spinning dust collapsing into a solar system with roughly one axis of spin. It also reminds me of my "Map Articulating All Talking" (MAAT) concept. I'm planning to write up a post about it, but until then this comment thread is where I've written the most about it. The basic idea is that currently it is impossible to communicate with groups of humans sensibly and a new social media platform would solve this issue. (lol, ambitious I know.)
The size of the "we" is critically important. Communism can occasionally work in a small enough group where everyone knows everyone, but scaling it up to a country requires different group coordination methods to succeed.
I'm somehow wanting to clarify the difference between a "bridging heuristic" and solving a bucket error. If a person is to be able to hope for "an AI pause" and "not totalitarianism" at the same time (like cousin_it), they aren't making a bucket error.
But, they/we might still not know how to try to harness social energy toward "an AI pause" without harnessing social energy toward "let any government that says it's pro-pause, move toward totalitarianism with AI safety as fig leaf".
The bridging heuristic I'd want would somehow involve built-in delimiters, so that if a if a social coalition gathered momentum behind the heuristic, the coalition wouldn't be exploitable -- its members would know what lines, if crossed, meant that the people who had co-opted the name of the coalition were no longer fighting for the coalition's real values.
Like, if a good free speech organization backs Alice's legal right to say [really dumb/offensive thing], the organization manages to keep track that it's deal is "defend anybody's legal right to say anything", rather than "build coalition for [really dumb/offensive thing]"; it doesn't get confused and switch to supporting [really dumb/offensive thing]. Adequate ethical heuristics around [good thing X, eg AI safety] would let us build social momentum toward [X] without it getting co-opted by [bad things that try to say they're X].
I think that the primary heuristic that prevents drastic anti-AI measures is the following: "A purely theoretical argument about a fundamentally novel threat couldn't seriously guide policy."
There are, of course, very good reasons for it. For one, philosophy's track record is extremely unimpressive, with profound, foundational disagreements between groups of purported subject matter experts continuing literally for millennia, and philosophy being the paradigmatic domain of purely theoretical arguments. For another, plenty of groups throughout history predicted an imminent catastrophic end of the world, yet the world stubbornly persists even so.
Certainly, it's not impossible that "this time it's different", but I'm highly skeptical that humanity will just up and significantly alter the way it does things. For the nuclear non-proliferation playbook to becomes applicable, I expect that truly spectacular warning shots will be necessary.
Feels true to me, but what's the distinction between theoretical and non-theoretical arguments?
Consider the mythological case of the calculation if the atomic bomb would ignite the atmosphere. I guess the concern guided the "policy" to perform the calculation. And if it came out as 50% of omnicide, atomic bombs would be prevented, despite the lack of a spectacular warning shot.
Policy has also ever been guided by arguments with little related maths, for example, the MAKING FEDERAL ARCHITECTURE BEAUTIFUL AGAIN executive order.
Maybe the problem with AI existential risk arguments is that they're not very convincing.
Feels true to me, but what’s the distinction between theoretical and non-theoretical arguments?
Having decent grounding for the theory at hand would be a start. To take the ignition of the atmosphere example, they did have a solid enough grasp of the underlying physics, with validated equations to plug numbers into. Another example would be global warming, where even though nobody has great equations, the big picture is pretty clear, and there were periods when the Earth was much hotter in the past (but still supported rich ecosystems, which is why most people don't take the "existential risk" part seriously).
Whereas, even the notion of "intelligence" remains very vague, straight out of philosophy's domain, let alone concepts like "ASI", so pretty much all argumentation relies on analogies and intuitions, also prime philosophy stuff.
Policy has also ever been guided by arguments with little related maths, for example, the MAKING FEDERAL ARCHITECTURE BEAUTIFUL AGAIN executive order.
I mean, sure, all sorts of random nonsense can sway national policy from time to time, but strictly-ish enforced global bans are in an entirely different league.
Maybe the problem with AI existential risk arguments is that they’re not very convincing.
Indeed, and I'm proposing an explanation why.
Curated! Thanks for the post.
I am having a hard time explaining why I'm curating this. I think for me this post is helping me move ethics from some mystical thing, into an engineering/design problem, which is how I think a lot of it should be thought of. I recently having been reading a book on the classic virtues, and it makes them sound so dreary and uncompelling; this account seems more true and healthier.
I think the examples were great and really illustrated this well. But I agree with a comment (from Wei Dai) that this post is weak on proving that the presence or lack of good ethical design patterns was critical in the success or failure of different systems, rather than other factors.
I hope to see more writing about concrete cases, and more work to help turn our consequentialist analyses into ethical design patterns / folk ethics.
move ethics from some mystical thing, into an engineering/design problem
I like this vibe and want to promote my "Outcome Influencing System (OIS)" concept as a set of terminology / lens that may be valuable. Basically, anything that is trying to influence reality is an OIS, and so in that way it is the same as an optimizer, but I'm hoping to build up concepts around the term that make it a more useful way to explore and discuss these ideas than with existing terminology.
The relevance is that there are many "large sociotechnical OIS" that we have implicitly and explicitly created, and treating them as technology that should have better engineering quality assurance seems like a valuable goal.
I think today there's still a window of opportunity to stop AI without creating a world government. To build an AI today requires a huge "supercritical pile of GPUs" so to speak, which is costly and noticeable like uranium. But software advances can change that. So it'd be best to take the hardware off the table soon, with the same kind of international effort as stopping nuclear proliferation. But realistically, humanity won't pass such a measure without getting a serious scare first. And there's a high chance the first serious scare just kills us.
The only problem is that this would further accelerate the pressure to produce software advances. Certain software improvements are not being done maximally fast at the moment, because the industry leaders are overrelying on the “bitter lesson” and on the huge fleets of GPU in a somewhat brute force fashion.
(Michael Pollan in his “The Botany of Desire” explains how drug prohibition has resulted in much faster advances towards very strong modern cannabis by creating pressure to produce a stronger punch per unit of weight and volume.
People look at nuclear non-proliferation as a semi-successful example of prohibition, but the situation might be closer to our drug war. It’s easy to target the AI leaders with their large fleets of GPUs and large teams, just like it’s feasible to regulate the big pharma. It might be way more difficult to figure out who are all the small groups all over the world pursuing non-saturating recursive self-improvement of scaffoldings and such on top of already released open weight LLMs. A failed prohibition is likely to lower the odds of reasonable outcome by making the identity of the winner and the nature of the winning approach very unpredictable.)
I would like to draw a strong distinction between a "world government" and an organization capable of effecting international AGI race de-escalation. I don't think you were exactly implying that the former is necessary for the latter, but since the former seems implausible and the latter necessary for humanity to survive, it seems good to clearly distinguish.
III) If lots of (smart/senior) people seem to dismiss an idea, assume there's something wrong with it [even if most of the smart/senior people are doing local work that makes it locally disincentivized for them to seem not to dismiss that idea, eg because it would annoy Ai companies].
I do something like: Model the incentives placed on those smart/senior people, including by looking at them differentially (how is the average smart person at a lab thinking differently from an average smart person who is independent/at MIRI/etc), then also adjust for the memetic tug from a bunch of the smart people running thought cycles partly tethered to a conclusion for non-truth-tracking reasons?
Seems somewhat risky, as it's not hard to end up biased about the effect size and dismiss things you shouldn't, but combined with "and check the object level reasoning of some of the people who think it's a bad idea" this is the best patch I have so far.
I think the premise of transposing "software design patterns" to ethics, and thinking of them as building blocks for social construction, is inherently super interesting.
It's a shame the article really doesn't deliver on that premise. To me, this article doesn't read as someone trying to analyze how simpler heuristics compose into more complex social orders, it reads as a list of just-so stories about why the author's preferred policies / social rules are right.
It did not leave me feeling like I knew more about ethics than before I read it.
In my ontology "virtues" are ethical design patterns about how to make decisions.
I'm a virtue ethicist because I think that this kind of ethical design pattern is more important than ethical design patterns about what decisions to make (albeit with some complications that I'll explore in some upcoming posts).
(Having said that, I feel some sense that I'm not going to use "ethical design patterns" very much going forward—it's a little unwieldy as a phrase. I think I will just use "ethics", by contrast with things like "altruism" which IMO are less well-understood as design patterns.)
One big reason why people don't endorse Heuristic C (though not all of the reason) is that the general population are much more selfish/have much higher time preference than LW/EA people, and in general one big assumption that I think EAs/LWers rely on way too much is that the population inherently cares about the future of humanity, independent of their selfish preferences.
More generally, I think Robin Hanson's right to say that a lot of our altruism is mostly fictional, and is instead a way to signal to exploit social systems/cooperate with other people when it isn't fictional, and the behavior we see is most likely in a world where people's altruism is mostly fictional combined with people not knowing all that much about AI.
This is complementary with other explanations like xpym's.
More generally, a potential crux with a lot of the post is that I think that something like "rationalizing why your preferred policies are correct" to quote PoignardAzur, is ultimately what has to happen to ethical reasoning in general, and there's no avoiding that part, and thus involves dealing with conflict theory inevitably (the comment is how the proposed examples are bad since they invoke political debates/conflict theory issues, but contra that comment I think this isn't avoidable in this domain).
There are interesting questions to ask around how we got to the morals we have (I'd say that something like cooperation between people who need to share things in order to thrive/survive explains why we developed any altruism/moral system that wasn't purely selfish), but in general the moral objectivism assumptions embedded in the discourse are pretty bad if we want to talk about how we got to the morality/values that we have, and it's worth trying to frame the discussion in moral relativist terms.
Many practice and endorse ethical heuristics against the censure of speech on any topic, especially any salient and politically relevant topic, lest such censure mess with our love of truth, or our ability to locate good policy options via the free and full exchange of ideas, or our freedom/autonomy/self-respect broadly.
I don't think this is actually true.
Even among rationalists I believe there are red lines for ideas that cannot be raised without censure and disgust. I won't attempt to draw them. The fact that among rationalists these lines lie other than where many people would draw them, including on the topic of racial difference, is not accepted as evidence of a commitment to open-mindedness that overrides other ethical commitments but just as a lack of commitment to those specific principles, with the commitment to open-mindedness as thin cover. Tetlock's ideas around sacred values, which can't be easily traded off, may be useful here. It's not that those willing to discuss racial differences don't have sacred values, it's just that non-racism isn't one of them.
Regarding the clash between the prudence heuristic of "don't do something that has a 10% chance of killing all people" and other heuristics such as "don't impede progress," we have to consider the credibility problem in the assertion of risk by experts, when many of the same experts continue to work on A(G)I (and are making fortunes doing so). The statements about the risk say one thing but the actions say another, so we can't conclude that anyone is actually trading off, in a real sense, against the prudence heuristic. This relates to my previous comment: "don't kill all humans" seems like a sacred value, and so statements suggesting one is making the trade-off are not credible. From this "revealed belief" perspective, a statement "I believe there is a 10% chance that AI will kill all people" by an AI expert still working toward AGI is a false statement, and the only way to increase belief in the prediction is for AI experts to stop working on AI (at which point stopping the suicidal hold-outs becomes much easier). Conversely, amplifying the predictions of risk by leaders in the AI industry is a great way to confound the advocacy of the conscientious objectors.
“Meta-ethics” would be a good search term for other thoughts and philosophical works people have had about what you call “ethical design patterns”
Epistemic status: I’m fairly sure “ethics” does useful work in building human structures that work. My current explanations of how are wordy; I think there should be a briefer way to conceptualize it; I hope you guys help me with that.
My concerns here are not epistemic; they are about length, summarization, and chunking. I’ll offer two suggestions framed as questions:
Please tell the reader what you’re going to tell us right away. What is your central point? Are you primarily proposing a new ethical framework? Are you primarily aiming to improve particular areas; e.g. to AI safety?
Wouldn’t this be better presented as a sequence of posts? To think in reverse, could you make a good argument for why a single post of ~6000+ words is better?
I enjoy LessWrong because of its epistemic norms but often dislike* longer than necessary articles. I found my way here because of the Sequences; let’s do more of them! I would rather see ~600 to 1200 word self-contained pieces that link to other parts, which might be: prerequisite material, literature reviews, lengthy examples, personal motivations, and so on.
* I want authors to succeed at focusing reader attention and community discussion. Excessive length can hurt more than help.
I like this. Having strong norms for how posts should be broken up ( prereqs, lit review, examples, motivations, etc... ) seems like it would be good for engendering clarity of thought and for respecting peoples time and focus. However, it would need to be built on the correct norms and I don't know what those norms should be. Figuring it out and popularizing it seems like a worthwhile goal though. Good luck if you are picking it up!
I've actually been doing research on the Semmelweiss reflex, ethics, and the problems of unbearable knowledge and defensive unknowing in the context of making decisions about existential threats, with an eye on how to solve the problem of giving people more tools and cognitive resources. AI alignment is one of the existential threats of interest. (Essentially, I have been quietly working the same problem you are publicly working towards solving.)
I think you have some great points, but I have a lot to add.
Unfortunately, I lack a lot of the jargon of LW, and operate from different heuristics, so accidentally alienate people easily. Within the social mores of Less Wrong, what's appropriate? To message you privately, to write a very long comment, to create a separate post? What would you prefer? I would really love to talk about this without accidentally blowing anything up.
I wouldn't say I'm strongly a part of the LW community, but I have read and enjoyed the sequences. I am also undiagnosed autistic and have many times gotten into arguments for reasons that seemed to me like other people not liking the way I communicate, so I can relate to that. If you want to talk privately where there is less chance of accidentally offending larger numbers of people, feel free to reach out to me in a private message. You can think of it as a dry run for posting or reaching out to others if you want.
Since 2014, it is both the case that taboos against non-Woke speech have become less universal, and the case that ethnonationalism has (I think?) become considerably more prominent, which I take as at least limited evidence that such heuristic A was paying rent.
You can also take it as evidence for the opposite. Trumps first election was at a height of the taboo against non-Woke speech. A world where the taboos caused the current rise of ethnonationalism, does look like the world we are seeing.
Props to Christopher Nolan for trying to use the vehicle of an entire movie to bridge the closest available intuition of 'if you think you have a one in a million chance of igniting the earths atmosphere maybe don't do that.'
I'm a big heuristics bridging fan as I think that it is to some extent a way to describe a very compressed action-policy based on an existing reward function that has been tested in the past.
So we can think about what you're saying here as a way to learn values to some extent or another. By bridging local heuristics we can find better meta heuristics and also look at what times these heuristics would be optimal. This is why I really like Meaning Alignment Institute's work on this because they have a way of doing this at scale: https://arxiv.org/pdf/2404.10636
I also think that a part of the "third wave" of AI Safety which is more focused on sociotechnical stuff kind of gets around the totalitarian and control heuristics as it's saying it can be solved in a pro-social way? I really enjoyed this post, thanks for writing it!
- Heuristic C: “If something has a >10% chance of killing everyone according to most experts, we probably shouldn’t let companies build it.”
IMO, it’s hard to get a consensus for Heuristic C at the moment even though it kind of seems obvious. It’s even hard for me to get my own brain to care wholeheartedly about this heuristic, to feel its full force, without a bunch of “wait, but …”.
Heuristic F: “Give serious positive consideration to any technology that many believe might save billions of lives.”
That’s a big consideration for short/medium termists. Could another heuristic (for the longtermists) be Maxipok (maximize the probability of an OK outcome)? By Bostrom’s definition of X risk, a permanent pause is an X catastrophe. So if one thought the probability of the pause becoming permanent was greater than p(X catastrophe|AGI), then a pause would not make sense. Even if one thought there were no chance of a pause becoming permanent, if one thought the background X risk per year was greater than the reduction in p(X risk|AGI) for every year of pause, it would also not make sense to pause from a longtermist perspective. Putting these together, it’s not clear that p(X risk|AGI) ~10% should result in companies not being allowed to build it (though stronger regulation could very well make sense).
I think your choice of "contemporary example of inadequate ethical heuristics" weakens the post as a whole, by invoking the specter of the sort of third-rail issue where the discourse is relatively well-modeled by conflict theory. That is, I was loving the post until I got to that part and my brain was suddenly full of maybe-I'm-about-to-be-eaten-by-a-tiger alarms.
I was going to note that this seems to be the social-interaction special-case of policy utilitarianism (which I've used for years and would attribute to giving me some quality-of-life improvements).
However, from a quick google search it seems "policy utilitarianism" doesn't exist, and I have no idea what this concept is actually called, assuming I didn't make it up.
In short, it's a mix of functional decision theory, (rule) utilitarianism, and psychology (and possibly some Buddhism), along with some handwaving for the Hard Problem of learning under bounded rationality (which I'd assert the brain is good enough at to not need an explicit algorithm for it in a human ethical framework).
To go into the details, we know from e.g. psychology that we don't have full control over our "actions" on some conventional idea of the "object level". This applies both to individuals (e.g. addictions, cognitive biases, simple ignorance constraining outcome-predictions, etc) and societies (as Anna discusses above through some of the objections to prioritizing AI safety regulations).
So, instead of being consequentialist over external actions, take the subcomponent(s) of your mind that can be said to "make decisions", and consider your action space to be the set of possible transitions between policies over the output of that system, starting from whatever policy it's currently implementing (likely generated through a mix of genetic inheritance, early-life experiences, and to a lesser extent more-recent experiences).
Everything outside that one decision membrane (including the inputs to that decision-making mind-component from other parts of your brain and body) is an objective environmental factor which should be optimized based on your current decision policy (or any recursively-improved variants it generates by making decisions over the aforementioned decision-policy space).
For computational efficiency purposes, we can model our actions over the partial-policy space rather than the total-policy space, as is done in e.g. symbolic planning, and identify policies which tend to have good outcomes in either specific or general circumstances. This naturally generates something very much like deontological morality as a computational shortcut, while maintaining the ability to override these heuristics in highly constrained and predictable circumstances.
Extending the above point on deontological morality, since there is no privileged boundary separating the body and the external world, collaboration under 'policy utilitarianism' becomes an engineering problem of which heuristics either constrain the behavior of others towards your utility function, or make you more predictable in a way that incentivizes others to act in a way aligned to your utility function. (For the moralists in the audience, note that your utility function can include the preferences of others via altruism)
In practice, humans generally don't have the cognitive advantage over each other to reliably constrain others' behavior without some degree of cooperation with other humans / artificial tools (or single combat, if you're into that). As such, human-to-human communication and collaboration relies on all parties applying decision-heuristics which are compatible with each other on the relevant time-scales, and provide sufficient mutual predictability to convince all parties of the benefits of Cooperate vs Defect, without excessive computational burden on the broadcaster or the receiver.
Given the above two constraints, preference utilitarianism produces (at least in my eyes) the recommendation to design 'ethical heuristics' which are both intelligible to others and beneficial to your goals, and apply them near-unilaterally for social decision-making.
One useful 'ethical heuristic', given the above reasoning for why we want these heuristics at all, is sharing your heuristics with others who share (aspects of) your core values; this improves your ability to collaborate (due to mutual predictability), and if you trust your critical thinking then communities using this heuristic also mitigate any individual computational constraints on heuristic design by de-duplicating research work (P is ~cheaper than NP).
In service of these goals, the heuristics you share should not require significant investment from others to adopt (aka they should inherently contain 'bridging' components), and should be useful for pursuing the values you share with your interlocutor (so that they are willing to adopt said heuristics). Again, I don't know if I'm handwaving too much of the intermediate reasoning, or misinterpreting Anna as calling out the general principle of engineered ethics when she really intends to specifically call out the heuristic in the previous paragraph; but as far as I can tell this produces the points in this article as a special-case.
Curious if anyone has encountered this idea before, and also if I'm misinterpreting Anna's point in relation to it? (general critiques are welcome as well, since as mentioned I use the above principle myself)
[epistemic note: I'm trying to promote my concept "Outcome Influencing Systems (OISs)". I may be having a happy death spiral around the idea and need to pull out of it. I'm seeking evidence one way or the other. ]
[reading note: I pronounce "OIS" as "oh-ee" and "OISs" as "oh-ees".]
I really like the idea of categorizing, and cataloguing ethical design patterns (EDPs) and seeking reasonable EDP bridges. I think the concept of "OISs" may be helpful in the endeavour in some ways.
A brief primer on OISs:
It seems that when you use the word "mesaoptimizers" you are reaching for the word "OIS" or some variant. Afaik "mesaoptimizer" refers to an optimization process created by an optimization process. It is a useful word, especially for examining reinforcement learning, but it puts focus on the process of creation of the optimizer being an optimizer, which isn't really the relevant focus. I would suggest that instead "influencing outcomes" is the relevant focus.
Also, we avoid the optimizer/optimized/policy issue. As stated in "Risks from Learned Optimization: Introduction":
a bottle cap causes water to be held inside the bottle, but it is not optimizing for that outcome since it is not running any sort of optimization algorithm.
If what you care about is the outcome, whether or not water will stay in the bottle, then it isn't "optimizers" you are interested in, but OIS. I think understanding optimization is important for examining possible recursive self improvement and FOOM scenarios, so the bottle cap is indeed not an optimizer, and that is important. But the bottle cap is an OIS because it is influencing the outcome of the water by making it much more likely that all of the water stays in the bottle. (Although, notably, it is an OIS with very very narrow versatility and very weak capability.)
I'm not too interested in whether large social groups working towards projects such as enforcing peace or building AGI are optimizers or not. I suspect they are, but I feel much more comfortable labelling them as "OISs" and then asking, "what are the properties of this OIS?", "Is it encoding the preferences I think it is? The preferences I should want it to?".
Ok, that's my "OIS" explanation, now onto where the "OIS" concept may help the "EDP" concept...
EDPs as OISs:
First, EDPs are OISs that exist in the memetic substrate and influence individual humans and human organizations towards successful ethical behaviour. Some relevant questions from this perspective: What are EDPs capabilities? How do they influence? How do we know what their preferences are? How do we effectively create, deploy, and decommission them based on analysis of their alignment and capability?
EDPs for LST-OISs:
It seems to me that the place we are most interested in EDPs is for influencing the behaviour of society at large, including large organizations and individuals who's actions may affect other people. So, as I mentioned about "mesaoptimizers", it seems useful to have clear terminology for discussing what kinds of OIS we are targeting with our EDPs. The most interesting kind to me are "Large SocioTechnical OISs" by which I mean governments of different kinds, large markets and their dynamics, corporations, social movements, and any other thing you can point out as being made up of large numbers of people working with technology to have some kind of influence on the outcomes of our reality. I'm sure it is useful to break LST-OISs down into subcategories, but I feel it is good to have a short and fairly politically neutral way to refer to those kinds of objects in full generality, and especially if it is embedded in the lens of "OISs" with the implication that we should care about the OISs capabilities and preferences.
People don't control OISs:
Another consideration is that people don't control OISs. Instead, OISs are like autonomous robots that we create and then send out into the world. But unlike robots, OISs can, and frequently are, created through peoples interactions without the explicit goal of creating an OIS.
This means that we live in a world with many intentionally created OISs, but also many implicit and hybrid OISs. It is not clear if there is a relationship between how an OIS was created and how capable or aligned it is. It seems that markets were mostly created implicitly, but are very capable and rather well aligned, with some important exceptions. Contrast Stalin's planned economy, which was an intentionally created OIS which I think was genuinely created to be more capable and aligned while serving the same purpose, but turned out to be less capable in many ways and tragically misaligned.
More on the note of not controlling OISs. It is more accurate to say we have some level of influence over them. It may be that our social roles are very constrained in some Molochian ways to the point that we really don't have any influence over some OISs despite contributing to them. To recontextualize some stoicism: The only OIS you control is yourself. But even that is complexified by the existence of multiple OIS within yourself.
The point of saying this is that no human has the capability to stop companies from developing and deploying dangerous technologies, rather, we are trying to understand and wield OIS which we hope may have that capability. This is important both in making our strategy clear, and in understanding how people relate to what is going on in the world.
Unfortunately, most people I talk to seem to believe that humans are in control. Sure, LST-OISs wouldn't exist without the humans in the substrate that implements them, and LST-OISs are in control, but this is extremely different from humans themselves being in control.
In trying to develop EDPs for controlling dangerous OISs, it may help to promote OIS terminology to make it easier for people to understand the true (less wrong) dynamics of what is being discussed, or at least it may be valuable to note explicitly that people we are trying to make EDPs for are thinking in terms of tribes of people where people are in control instead of complex sociotechnical systems, and that will affect how they relate to EDPs that are critical of specific OISs that they view as labels pointing at their tribe.
...
Ha, sorry for writing so much. If you read all of this, please lmk what you think : )
This is totally misguided. If heuristics worked 100% of the time they wouldn't be rules of thumb, they'd be rules of nature. We only have to be wrong once for AI to kill us.
Sorry, I’d like to understand you but I don’t yet; what claim do you think I’m making that seems totally misguided, please?
You are arguing that it is tractable to have predictable positive long term effects using something that is known to be imperfect (heuristic ethics). For that to make sense you would have to justify why small imperfections cannot possibly grow into large problems. It's like saying that because you believe that you only have a small flaw in your computer security nobody could ever break in and steal all of your data. This wouldn't be true even if you knew what the flaw was and, with heuristic ethics, you don't even know that.
That doesn't follow. It's more like saying "password systems help protect accounts" even though you know those systems are imperfect. Sure, people keep reusing the same passwords and using passwords that are guessable, but that doesn't mean not using passwords at all and taking people at their word for who they are is superior (in most systems that need accounts)
The minimal standard is "using this system / heuristic is better than not using it", not "this system / heuristic is flawless and solves all problems ever".
In a general discussion of ethics your replies are very sensible. When discussing AI safety, and, in particular P(doom), they are not. Your analogy does not work. It is effectively saying trying to prevent AI from killing us all by blocking its access to the internet with a password is better than not using a password, but an AI that is a threat to us will not be stopped by a password and neither will it be stopped by an imperfect heuristic. If we don't have 100% certainty, we should not build it.
Related to: Commonsense Good, Creative Good (and my comment); Ethical Injunctions.
Epistemic status: I’m fairly sure “ethics” does useful work in building human structures that work. My current explanations of how are wordy; I think there should be a briefer way to conceptualize it; I hope you guys help me with that.
It is intractable to write large, good software applications via spaghetti code – but it’s comparatively tractable using design patterns (plus coding style, attention to good/bad codesmell, etc.).
I’ll argue it is similarly intractable to have predictably positive effects on large-scale human stuff if you try it via straight consequentialism – but it is comparatively tractable if you use ethical heuristics, which I’ll call “ethical design patterns,” to create situations that are easier to reason about. Many of these heuristics are honed by long tradition (eg “tell the truth”; “be kind”), but sometimes people successfully craft new “ethical design patterns” fitted to a new circumstance, such as “don’t drink and drive.”
I will close with a discussion of what I think it would take to craft solid ethical heuristics near AI development, and to be able to thereby put energy into AI safety efforts in a way that is less in a fight with others’ desires to avoid totalitarianism, economic downturns, or other bad things.
We humans navigate mostly by intuition.[1] We can do this, not because our intuitions match up with the world from birth (they mostly don’t), but because:
I’ll spell this out a bit more in the case of math, physics, and coding. Then, with this example in hand, I'll head to ethics.
Humans predict math and physics using informal/intuitive guesswork (mostly) and observation and formal proof (occasionally, but with higher credibility when we do check).
How do we get good-enough intuitions to make this work? Partly, we start with them: humans have some intuition-reality match from the get-go. Partly, we revise them fairly automatically with experience, as we play around and "get a feel for things."
But also, partly, we revise them on purpose, with full verbal consciousness: great physicists from Archimedes to Galileo to Einstein have pointed out incoherences in our starting intuitions about physics, and have actively worked to help us set up new ones. Good teachers enjoin students to acquire good intuitions on purpose, and help show how.[2][3]
Intuitions are deliberately crafted parts of a scientist's power.
In addition to revising our intuitions to better match what’s true, we also design "what's true" (in our built world, where we can) to be more intuitive. This happens even in math (which I might've thought "isn't built"): in math, we search out terms, definitions, etc that will make math's patterns intuitive.
For example, as a kid I wanted 1 to be considered a "prime;" it seemed initially more elegant to me. Maybe it seemed that way to early mathematicians too, for all I know! But it is in fact more elegant to exclude 1 from the set of "primes," so we can have "unique prime factorization". Mathematicians layered many many design choices like this to make it more natural for us to "think like reality."
The same holds, in a more extreme way, for coding large software applications – many design choices are made in concert, using much cleverness and tradition, such that somehow: (a) the software can fit in one's head; and (b) one can often add to or change the software in ways that still leave it fitting in one's head (rather than the number of [interactions to keep track of] growing exponentially).
Most people, in our personal lives, make at least some use of ethical heuristics such as “tell the truth,” “be kind,” “don’t drink and drive.”
Also, many human organizations (especially large, long-lasting ones) make use of “ethical design patterns” such as “let promotion and pay increases be allocated fairly, and be linked to high-quality work.”
What process produces these “ethical heuristics”?
So, I unfortunately do not fully know what process produces our ethical heuristics, and even insofar as I do know, the answer is partly “it’s complicated.”[4] But part of it is: we sometimes act in a manner that is either: (a) saliently guided by a traditional ethical heuristic, such as “tell the truth”; or (b) saliently in violation of a traditional ethical heuristic, such as “tell the truth.” Later, we notice our action's impacts (what happened to us, to other parties, to our relationships) and aftertastes (how it feels in us). And, if we like/dislike these impacts and aftertastes, we adjust our trust in this heuristic upward or downward – especially if, after meditating on the example, we come to understand a “physics” that would systematically produce results like the one we observed.
“Tell the truth,” for example, is often passed to children with accompanying reasons: “so people will trust you,” “so you’ll feel good about yourself,” “so you’ll become a person of good character,” “so you can have real friends and allies.” This makes it easier for the kid to track whether the heuristic is paying rent, and to update toward it insofar as it’s valid.
To give a different sort of example: communism was plausible to many intellectuals in the early 1900’s, and I found “The Internationale” (a communist anthem) ethically and aesthetically attractive when I first heard it as a teen. But after studying the history of the Soviet Union, my intuitive reaction to this music became more wary.
Human cities, with their many purposes and people and associations, are even more complicated than large software projects.
Perhaps relatedly, many actions among humans can “backfire” from their intended result, to the point where it’s hard to predict the sign of one’s effect. For example:
Interestingly, though, many things aren’t tangled and confusing in this way.
Some places where people mostly can act, with predictable results:
AFAICT, the above are all areas where a person can usually pursue what they want without getting much in the way of what somebody else wants. This greatly reduces the odds of backfire.
How did it come about, that people can act in the above ways without tangling with somebody else’s plans? Via deliberate ethical design work, I think. For example, “property rights” were worked out on purpose (via “don’t steal,” patterns for disincentivizing stealing, patterns for making it clear whose property is whose, etc.), and this is how it came about that my tea-getting plans predictably didn’t tangle with anyone else’s plans.
To see this design work more sharply, consider institution design. Suppose, for concreteness, that you would like to build a postal service (and suppose you’re a bureaucrat of enormous power, who has a shot at this). You want your postal service to deliver mail well, and to stay true to this purpose for a long time.
How do you do this? You employ some standard-issue ethical design patterns:
Insofar as you succeed in this design work, you’ll create a context where an individual postal worker can be wholehearted about their work: the thing that is good for them personally will also be good for the postal service, and vice versa (at least mostly). You’ll also create a context where the postal service as a whole can be mostly in alignment with itself as it heads toward or away from particular changes that would make it better/worse.
(Without such ethical design work, the postal service would be more likely to degenerate into bureaucratic politics in which some parties gain local power, and set up feifdoms, at the expense of the organization as a whole.)
More generally, it seems to me that ethical heuristics can be seen as a pattern language for causing “mesaoptimizers” within a person, and “mesaoptimizers” that arise in the politics across people, to direct their energies in a way that benefits both the mesa-optimizer in question, and:
“Ethics,” here, is not an unchanging list of rules that the institution-designer should always obey, any more than design patterns in computer science are a list of rules to always obey. It’s a pattern language, honed over long experience, with patterns that work best when selected for and tailored to context.
In the case of drunk driving, “Mothers Against Drunk Driving” (MADD) did a lot to create and propagate visceral heuristics that made it “obvious at a glance” that people shouldn’t drink and drive. They imbued these in phrases such as:
After MADD’s work, it became easier to be wholeheartedly against drunk driving (vs feeling conflicted, where your desire to avoid tension with your friends was in conflict with your fear that they’d kill someone, say); it also became more likely that a person’s self-interest near questions of drinking and driving would line up with what was good for others (via laws punishing drunk drivers; via laws making bartenders sometimes responsible for drunk drivers; via social heuristics causing disapproval for driving drunk and approval for those who help prevent it; etc).
The phrase “earning to give” (and the EA community’s support of this phrase) makes it viscerally obvious how a person in eg finance expects to make the world better. This makes it easier for a person to do such work wholeheartedly, and easier for the EA community to feel visceral approval for people doing such work.
I’ve been mostly-impressed with the YIMBY movement’s crafting of ethical heuristics, although it is recent and has not yet produced deeply-embedded ethical heuristics on the level of MADD. Relevant ethical intuitions it has produced (at least some places):
These intuitions could use pithy phrases. They could use more embedded obviousness, and more spread. They could use more synthesis with potentially conflicting intuitions about upsides of zoning, and about how falling house-prices will be costly for existing homeowners’ life savings. But it’s on path.
I can tell it’s on path not only because it’s producing some of the house-construction change it’s after, but because it isn’t particularly polarizing, and it seems to be moving many peoples’ intuitions in a quiet, non-dramatic way (“huh, yeah, it is good if people can afford housing”).
I'd like also to look in some detail at situations where people didn't have adequate ethical heuristics.
In 1840s Vienna, Ignaz Semmelweis became justifiably confident (via experimentation and an abundance of data) that the ~7% death rate among delivering mothers could be greatly reduced if doctors would wash their hands in a chlorinated lime solution. This claim was difficult to assimilate within Viennese society at the time.
We can tell this claim was hard to assimilate sanely, because:
You might ask why I say “the claim was hard to assimilate sanely” rather than, say, “the Viennese establishment pursued their (corrupt) interests sanely but sadly.” One reason is that Semmelweis and Michaelis seem also to have had trouble acting sanely (even premised on the medical establishment acting as it did).
You might also ask why I believe the difficulty assimilating the claim was partly an ethics gap. Partly, the difficulty assimilating Semmelweis’s claim was because germ theory wasn’t yet known (which I'd call a "science gap," not an "ethics gap"). Semmelweis's data should have been persuasive anyhow, since the effect size was huge, the number of patients was huge, and it would've been low-cost for skeptical doctors to try handwashing for a month and count patient deaths. But I expect the science gap made it easier for people to not-see the effect (if they wanted to avoid seeing it), and harder for people to form common knowledge about it (even where they wanted to). So I expect the science gap made it harder for ethics to work here.
Still, most peoples’ guesses (and mine, though I’m fairly uninformed) is that people found it difficult to socially metabolize “high-prestige Viennese doctors have been causing patients’ deaths, via being unclean”, for reasons that were at least partly about social and ethical templates and power structures.
Rephrasing: my guess is that the 1840’s Viennese people mostly tried not to see part of what they could see, and/or mostly tried not to care about part of what they cared about, because they did not know how to look and care about everything all at once. Particularly not in public. Relatedly, I suspect many flinched from tradeoffs they found agonizing, e.g. “shall I ignore that doctors may be killing people, or be loud about how my doctor friends are maybe so bad they should be shunned / should maybe kill themselves, even though I’m not sure?”
They were in a context that had not yet been engineered to work well with their intuitions.
Next, I’d like to look in some detail at a situation where I believe our own time and place lacks adequate ethical heuristics. This example will make it harder for us to avoid being “mindkilled”, and it brings some risk of getting into a social battle within LW or with distant others. But I’d like to give us a chance to experience all this from the inside, before I go on to the (still more difficult, IMO) situation around AI development.
The example I pick: public discussion of group differences.
The tricky thing about group differences discussion, it seems to me, is that there are two ethical heuristics that each pay some rent, that we mostly do not know how to care about simultaneously. Namely:
Heuristic A: Avoid speech that may inflame racism.
Heuristic B: Allow free inquiry and conversation everywhere, especially anywhere important that’s widely specifically-not-discussed; talk about any “elephant in the room.”
What do people normally do when two rent-paying ethical heuristics collide? Such collisions happen often: there are situations where it’s difficult to be fully honest and fully kind at once (at a given skill-level), or to fully simultaneously adhere to almost any other pair of useful ethical heuristics.
To help navigate such collisions, people have created a huge number of “bridging” ethical heuristics, which prescribe how to respond when two valued ethical heuristics can’t both be followed at once. Sometimes, these bridging heuristics involve one heuristic overriding the other within a delimited region, as with a “white lie,” or with “pikuach nefesh” (a rule in Judaism that one should break the Sabbath, and many other rules, if it might save a life). Some other times, the bridging heuristic states that a given case is officially a “judgment call” in which a responsible party is supposed to weigh several goods against the circumstances, according to their their individual judgment.[6]
Regardless, the crucial feature of a “bridging” ethical heuristic is that it allows the two heuristics (if we understand these heuristics as memes, with their own ~agency) to peaceably share power, and to support one another's continued existence. It’s a bit like a property line: both memetic neighbors can support one another's validity in particular regions, without a “war of all against all.” Humans who care about Heuristic A and humans who care about Heuristic B can see eye to eye around the bridging heuristic, and can appreciate one another's reasonability. A “bridging heuristic” is thus the opposite of a scissors statement.
Several symptoms convince me that I, and many relevant “we”s, have insufficient ethical heuristics to be sane about how to speak publicly about group differences – and, more exactly, that we lack an adequate bridging heuristic between ethical heuristics A and B:
So, I think many of us (and especially, many polities taken as a whole) are missing an adequate set of ethical heuristics to allow our viscera, hearts, minds, etc to all orient sanely about all of what matters near public discussion of group differences.[9]
Now for the hardest example, and the one I will get most wrong, despite having tried: let’s discuss what ethical heuristics already help us think sanely and publicly about AI development, and where the gaps are.
To review: my claim is that an adequate set of ethical heuristics here would allow us to:
They would also help us create group contexts where:
(Why should it be possible to find ethical heuristics that do all or most of the above at once? I don’t fully know. But I believe from examples in other ethical domains that it generally is possible. I don’t fully know why design patterns are often viable in coding, either.)
Here are some ethical heuristics that’re already worked out (and are new in the last couple decades), that I think help.
I believe all four of these examples make our available ethical heuristics more "adequate" in the sense of i-vi above. (This is not an exhaustive list.)
Here’s an ethical heuristic (at least, I think it’s an ethical heuristic) that I personally care about:
IMO, it’s hard to get a consensus for Heuristic C at the moment even though it kind of seems obvious. It’s even hard for me to get my own brain to care wholeheartedly about this heuristic, to feel its full force, without a bunch of “wait, but …”.
Why?
I’m not sure, but I’d guess it’s at least partly because we lack good “bridging heuristics” between Heuristic C and some other key ethical heuristics. Such that people go to try to affirm C (or to endorse people/movements who affirm C) but then hesitate, because they’re afraid that heuristic C will gather a bunch of social momentum, and then trample something else that matters.
In particular, I’ve heard many (including many I respect) speak against C because they fear C will damage some of these heuristics:
For example, Peter Thiel worries existential risks will be used as an excuse for an anti-progress totalitarian government to seize power. (Heuristics D, E, F, I think.)
Alexandros Marinos (a long-time LWer who I respect, and who used to be into LW/MIRI-style safety efforts) opposes these efforts now, arguing that LWers are naive about how governmental power seizure works, that AI safety can be used as a fig leaf for power the government wants anyhow, that governmental involvement increases risk, and that AI is already partway through its takeoff and LWers aren't paying attention. (One tweet; but his views are scattered across many.) Alexandros’s views seem to me to be a combination of D and I, mostly.
(I think I’ve seen many other examples, from both thinkers I respect and randos on X, that I classed in these ways – but I’m low on time and want to post and have had trouble tracking these down, so perhaps I’m wrong about how common this is, or perhaps it really is common and some of you will post examples in the comments?)
We could try to work out "bridging heuristics", to allow heuristic C to co-exist stably with heuristics D/E/F/G/H. But, how badly do we need such bridges?
That is: In the absence of such heuristics, how well does it work to say: "Heuristic C is more important than the others, so, if we can't honor all the heuristics, let's make sure we honor C, at least"?
It seems to me that while a person can say that (and while it might sometimes be the best path, even), there are large costs. Here's my attempt to list these costs (I'm focusing here on totalitarianism (D), for readability, but I could write exactly analogous sentences about other heuristics). If we proceed without bridging heuristics:
I suspect there are analogous costs to failing to synthesize other key ethical heuristics, also.
Most pairs of ethical heuristics contradict one another sometimes – it is sometimes challenging to be both honest and kind at once, etc.
Still, many ethical heuristics get along with one another just fine, with the aid of “bridging heuristics,” as discussed earlier.
I would bet at decent odds that the tension currently existing between proponents of heuristic C, and proponents of D/E/F/G/H/I, is not a necessary thing, and can be eased with some sort of additional work (work I imagine many are already in progress on). That is, I suspect it is humanly possible to care about all the good things at once, and to take in all the obvious truths at once, without "missing moods" -- if it's currently hard to fit in all our heads all at once, we can locate stories and examples that make "here's one way we could care about all of it" concrete.
I also think there's been good continuing work on such bridging heuristics for as long as there's been an AI safety movement; I'm writing this essay to try to bring a bit more theory or explicit modeling to a thing that is probably already on many todo lists, not to introduce a whole new project. On this note, I quite enjoyed Joe Carlsmith's series On Otherness and Control in the Age of AGI, and believe he is chipping away at the Heuristic C/ Heuristic H collision.
That said, many domains (such as Covid policy) seem to get more scissors-y rather than bridgey.
To restate my main thesis: I think ethical design patterns are a pattern language for aligning mesaoptimizers, including mesaoptimizers within a human, and mesaoptimizers within a set of human politics, so as to get functional human structures (such as a postal service that delivers mail, rather than one that degenerates into politics; or a human being who has something to protect, rather than one who's a random bag of impulses).
I most care about building functional human structures for not all dying of AI. But I also care about buildling functional human structures for various subproblems of this, such as "aggregating information sensibly about AI safety stuff."
One smaller puzzle, there, concerns the collision of these ethical heuristics:
I believe both that each of these three heuristics does useful work for us on average, and that their collisions have sometimes caused us to believe false things, as evidenced by eg this exchange between me and Tomás B. I worry also that they make us more manipulable by AI companies. Finding a way to keep most of the upsides of each heuristic, while not having our epistemics messed with, is an easier ethics puzzle, and still useful.
In math, for example, mathematicians tend to guess which theorems are true before they can formally prove them, and they tend to guess which proof structures may work out before those proofs have been completed. (Navigating without intuitions, and doing instead a brute force search to locate proofs, would be impossibly slow, combinatorially slow.)
If you’d like to get a taste of how this works, and haven’t, you might check out the book Thinking Physics, which allows a reader to reason their own way to high school physics almost purely via riddles and thought experiments, with surprisingly little need of observations or of teachers' say-so.
Anytime you find a place where your starting intuitions predict wrongly, you can dwell on it until you intuitions come to predict it rightly. I was taught this explicitly as a high school student, at Glenn Stevens’s PROMYS program; Prof. Stevens emphasized that after becoming confident we had proven a given conjecture, we should not stop, but should instead persist until we could “see at a glance” why the conjecture had to be true, ie until the new theorem's insight becomes an automatic part of one's visualized mathematical world. Eliezer recommends similarly about physics.
As an example of "it's complicated": adults have more memetic power than children, and I suspect this is part of why "honor your parents" is a more traditional piece of ethical advice than "cherish your children." There are probably many places received ethical heuristics are bent toward "that which advantages the ethics memes, or the meme-spreaders" rather than toward that which would aid the recipients of the meme.
Richard Ngo argues that achieving the opposite of one's intended effect is common among ideologies and activists.
There are also other sorts of bridging heuristics. If the leader of a country sends soldiers into battle, he is expected to publicly honor those of his soldiers who die, to provide for those who become disabled, and to be careful not to lose their lives unnecessarily; this bridges between "countries should be able to defend themselves" and "lives are sacred." This is an example of a more complex and prescribed sharing of power between two heuristics.
Conversely, when I refrained from other non-woke speech for fear of social reprisal, I tended also to dissociate a bit, with a different “missing mood.
I caught up with a college friend after many years, and he told me he'd realized he's a "congenital comedian": "You know how everybody has sensors that detect what not to say, and make them not say it? Well, I have those too -- except the sign is reversed. This explains my life."
"Congenital comedian" is the thing I've actually observed in some friends on this topic. For example, my friend "Bob" was once with me in a room full of near-strangers who he didn't want to make a bad impression on. And he said something slightly awkward that was slightly-near the topic of race and IQ. And then he tried to clarify, and made it worse, and tried again to clarify, and made it double worse, for like four iterations of trigger (without anybody else saying much of anything), while turning increasingly red.
As an aside, I like Kelsey’s recent small ethical innovation, where she now advocates saying true things even when they aren’t important, and are likely to cause mild social annoyance/disapproval, so that they won’t only be said by people in other information bubbles. Discussed in the first half of her recent dialog with Zack.
Eliezer and Nate’s new book is palpably compatible with this heuristic, IMO, which I appreciate. This makes me feel better about recommending their book to relatives, as I more expect that reading the book will be okay for my relatives. We can see here an example of better ethical heuristics improving incentive-alignment for different people: because IABIED contains this improved ethical heuristic, it’s more in my relatives’ interest to inform themselves about this part of the world (since they’ll be less wrecked by it), and it’s more in my interest to tell them about it.
Cf failing with abandon, motive ambiguity; the fact that high-level actions do not screen off intent and so if we can't properly care in our hearts about (totalitarianism-risk and death-risk at once, or whatever), we probably also can't fully act right.
In anti-epistemology, part of epistemology becomes explicitly attacked by the planning self (“it’s wrong to want evidence, and right to have faith, about religion.”) I believe there is an anlogous phenomon that I call “anti-caring”, where part of one’s own caring becomes explicitly attacked by the planning self.