This was written for the LessWrong Review, intended as a high level exploration of the 2020 posts on Moral Mazes, Simulacra, Coordination Theory, and Civilizational Adequacy.
I. What Makes For a Useful Review?
When reviewing LessWrong posts, fairly often my reaction is "Cool hypothesis bro, but, can we actually go out and verify it?". A lot of LessWrong posts explore interesting ideas, and those ideas match my intuitions. But lots of intuitive ideas turn out to be wrong, and sometimes people have different intuitions. It seemed good to have a step where we actually turned armchair philosophy into empirical claims.
A year ago, I began two attempts at investigating some claims:
- Eliezer's conjecture that Mathematicians were more law-abiding that non-mathematicians (controlling for IQ).
- The overall Moral Mazes sequence.
Mathematicians and "Is this data worth the expense?"
Elizabeth and I made some attempt to run a Positly study on math / law-abidingness. We quickly ran into "Running a real, legit study is a pretty big time cost." (We ran one quick-and-dirty study that essentially boiled down to asking people 'are you good at math?' and 'do you do crimes?', which Eliezer specifically called out as Not A Good Approach in his post. We tried looking for datasets that might be relevant to our interests, but finding and sifting through them was hard).
Ultimately I faced the question: "is this question actually worth investing this effort in?" and I felt the answer was "probably not." It seemed vaguely informative, and I felt a nagging annoyance at publishing Local Validity As Key To Sanity and Civilization without having investigated it. But after a lot of reflection (on the timescale of years), I came to the belief that not every claim is worth investigating.
If I was worried about the epistemic integrity of the LessWrong books, a much cheaper approach would be to include some metadata about the epistemic status of each post, or suggestions of future work that signposted which areas warranted more investigation.
Moral Mazes, and "What are we actually trying to figure out?"
The Immoral Mazes sequence raised a lot of interesting hypotheses, which matched my intuitions/experience. But it seemed to me the next step was do form some actual predictions and go out and test them.
At the time, I was thinking about Mazes largely from the standpoint of "I want to build a rationalist/EA/longtermist ecosystem that accomplishes good at scale. How do we do that, without succumbing to Mazey pressures?". Example of questions I wanted to test were: "Do more layers of hierarchy reliably lead to more Mazeyness? How reliably? What observations do I actually expect if an org is 'mazey'? Does it help if an org is broken into smaller pieces or does that just shuffle around the problem? How do we evaluate middle managers?"
Testing these things would not be a cheap undertaking. But unlike the Mathematician example, it actually seemed important.
I went to chat with Oliver Habryka about it, and he noted (paraphrased) "For building an organization, I'm not sure it's actually all that useful to study how exactly things fail. I'd much rather see a bunch of case studies of how things succeeded. What types of orgs produce value, and how do they go about it?"
This was an interesting takeaway for me – when reviewing posts, the most important takeaways don't necessarily look like "evaluate specific claims". The important thing might be "figure out what decisions this post bears on, use the post as a springboard/pointer to help figure out what you actually need to know.
In my ideal world, there's funding and bandwidth to followup on lots of claims, fleshing out our understanding. Realistically, that bandwidth is limited. I think it makes sense to prioritize questions that are important enough that you'd actually be willing to pay money for the answers (or, questions where people are naturally driven by curiosity or righteous "someone is wrong on the internet" to get the answers).
II. The Civilizational Adequacy Cluster
With all that in mind, when I look at the posts from 2020, there's a cluster of themes that fit together.
Some of them clump fairly "obviously":
- The Immoral Mazes sequence
- Simulacrum Levels.
- Civilizational response to covid, including Credibility of the CDC, and Seemingly Popular Covid-19 Model is Obvious Nonsense. ("Simulacrum Levels and their Interactions" blends with the previous topic)
- Anna's thoughts on Where Do/Did Stable Cooperative Institutions Come From?, (and jacobjacob's corresponding Babble Challenge)
In my mind, these connect into a fuzzy question of "What actually is up with civilization? Is it functional? Is it trustworthy? Is much of it a corrupt, immoral mess? Can/should we cooperate with it? If not, what do we do instead?"
Perhaps less obviously, I also think our work on coordination and game theory fits into this cluster. Some of these are approaching the question from a different angle, but ultimately help me build a unified map of "what's up with civilization?" Some concrete concepts I took away last year:
Schelling Problems (as opposed to PD or Staghunt)
Abram's Most Prisoners Dilemmas are Stag Hunts, most Stag Hunts are Schelling Problems was a crisp conceptual update for me. For the first time I felt like I could look at realworld problems and have a game theory abstraction that worked at multiple levels, where I could see all the way down into the math and it made sense. This dovetailed with Coordination as a Scarce Resource to me give a sense of why coordination problems are hard, but lucrative to solve.
Simulacrum 3 as Stag Hunt Strategy
Another conceptual update was some clarity on what Simulacrum 3 / Belief-As-Attire is for. We've been using belief-in-belief as a coordination mechanism for getting into better equilibria, and we actually need better alternatives.
Perhaps least obviously, Critch's Some AI Research Areas And Their Relevance To Existential Safety. Much of Critch's agenda seems to explore "how exactly is civilization going to coordinate in a way that preserves human values, when you have tons of AI running around making decisions and executing plans?".
(But, be careful with game theory)
My thinking here is also cautioned by Elinor Ostrom's book Governing The Commons, which argues that simplified game theory models in fact are pretty misleading and you need to actually model the actual situation which has tons of nested sub-games in order for game theory to get you "the right answer." This seems true. I nonetheless feel like these two posts ratcheted forward a clear understanding in me, and I have some hope that eventually the theory can turn into real illumination.
III. Lenses for "Why (In)adequacy matters"
Civilization is huge. "What's up with civilization and why does it sometimes do things that seem crazy?" feels like it ought to be important to a lot of decisionmaking.
I had previously spent a bunch of time thinking through the lens of "what kinds of orgs do I want to build, and how do I make sure they don't become corrupt?". But there are other lenses.
Moral Mazes and Civilizational Adequacy as Cause Area
Zvi is notably not an Effective Altruist, but when I translate (Im)moral mazes into the EA framework, one obvious followup thought is:
"Hrmm. If the world is infested with Mazes, maybe this is horrifying and bad and is maybe worthy of consideration of a top EA cause?"
Two different reasons you might think this:
Crippling Civilization. The maze nature destroys an organization’s ability to communicate clearly and accomplish their original goals. Mazes try to turn other orgs they interact with into mazes, so this problem may be afflicting large swaths of humanity. This may have significantly harmed our ability to respond to covid effectively, and to other complex 21st century problems. The “moral mazes” problem is bad because it cripples civilization as a whole.
Bad for the People Involved. The maze nature is harmful to the welfare of individuals who end up trapped within it. People are encouraged to devote everything (their family life, their hobbies, their free time) to the organization. The organization doesn’t even care about its own ostensible goals. Only a teeny fraction of people who make it to “the top” get any kind of reasonable payoff for their sacrifice. Since mazes try to create more mazes, this means a large swath of humanity is afflicted by soul-sucking machinery.
Bad for the People Involved. Or, "Classic Givewell-style EA"
Point #2 basically amounts to some cluster of "People are suffering, people aren't achieving their potential, living the fulfilling lives they could be." And there's some math about whether the scope of this problem, and the tractability of solving it, are comparable to people suffering/not-achieving-potential because they are killed by malaria or whatever.
I don't actually know the answer to the math. There's a question of to what degree moral mazes are pervasive, and how unhealthy they for the median person caught up in them. Then there's a question of what sort of interventions work and how effective they are. It's hard to do 'vanilla' EA math until there's at least some people who've tried solving the problem, whose progress can be evaluated.
From this perspective, a next step might be "thinking more about the scope of the problem, and what might help."
Crippled Civilization, or "Maybe humanity's hamming problem?"
Point #1 seems like a strong contender for "humanity's biggest problem." It might or might not be tractable, but if true, it affects all other problems we might need to coordinate around.
"Moral mazes" is a potential subproblem of "civilization just doesn't seem to be able to reliably do sensible-things-at-scale." And here is where a lot of the other "Civilizational Adequacy post cluster" come in. There seem to be multiple problems at play, all the problems seem gnarly and hard with a lot of entrenched political interest.
I'm not sure I trust a lot of the evidence that has been reported to me about "how functional is civilization?". Information about this is heavily politicized, and anecdotes that reach me are selected for being outrageous. Still, I've gotten a lot of bits of info that seem to at least there's a ton of room for improvement here.
Here, some obvious questions are: Does the moral mazes framework suggest solutions to 'civilization not being able to act sensibly at scale'? Do simulacrum levels?
My naive thoughts all go in the same direction as my original query "how can I avoid creating moral mazes, or minimize their damage/degree?". Oliver's suggestion seems potentially still relevant: "look for successful organizations, and parallels between them, rather than spending a lot of time thinking about failure." Only in this case it's something like "Look at the places where civilizations succeeded. Maybe particularly look for places where a civilization's competence sort of wavered and recovered." (I'm wondering if this is a good job for Jason Crawford)
But I'm interested in followup work that explores entirely different avenues. "Look at success" is harder when the topic is macro-history, and there are probably other options.
Civilizational Adequacy as AI-Strategy Crux
The previous points were mostly things I thought about last year. But this year, as I chewed on the MIRI 2021 Conversations, it struck that there's a different application of the "is civilization terrible?" question.
It seems to me that whether one believes civilization is horribly inadequate has a pretty big impact on one's AI strategy.
If you think our civilizational apparatus is basically functional, or close to functional, then it looks more like a reasonable approach to base your AI macro-strategy on government policy, agreements between companies, or changing the longterm AI research institutional culture.
If you think civilization is dysfunctional, you're more likely to think you need a pivotal act to permanently change the gameboard, and that all the other stuff won't help.
(I don't mean to imply this is the only major crux, or even the most important one. But it seems like a fairly big one)
In many areas, I don't think it's that bad that the LW / AI Risk community struggle to reach convergence. There's a lot of research ideas, and it's maybe just fine to have lots of people pursuing their own agendas in parallel while thinking each other's research agendas are doomed. The fact that we can't agree is evidence that at least one of us is thinking wrong, but the best group-epistemic strategy might not be to try to enforce convergence.
a) I think disagreement about this is a pointer to something that (I think) should be important to people's individual epistemic journeys
b) the disagreement is particularly annoying because it points in fairly different directions on what one is trying to do with an AI Alignment community, which leads to some plans working at cross-purposes, with people vaguely annoyed or embarrassed by each other.
Unfortunately, this seems like a topic that is some combination of political and aesthetic, and easy to collapse into a "yay/boo civilizational adequacy?" question rather than making specific predictions. When I imagine trying to get other people to engage seriously with the topic I feel some despair.
I think there's one set of biases that color people's thinking by default (i.e. generally having a hard time distinguishing social reality from reality, going funny in the head with politics, being vaguely optimistic about things, etc).
...and another set of biases that LessWrong folk tend to have (selected for not playing well with others, being extra annoyed at things that don't make sense according their strong inside view, etc)
I'm still confused about how to think about this. I think the most useful next action for me to take is to try to think about this for myself, without worrying about whether anyone else agrees with me or is thinking about it the same way.
I feel drawn to questions like:
- Which parts of civilization are the relevant parts for coordinating on AI deployment?
- How can I get unbiased samples of "what actually happens when the relevant pieces of civilization try to coordinate at scale?"
- How quickly do arbitrary AI companies adopt best practices? If good alignment technology is developed, but in a different paradigm than whatever Deepmind etc are using, would Deepmind halt its current projects and switch to the new paradigm?
- How lumpy is progress? I feel confused about Eliezer and Paul's debates about how smooth the economy actually is in practice.
But maybe the best question is "what is the actual best end-to-end plan that routes through large-scale coordination, vs what is the best end-to-end plan routing through pivotal-act-type interventions?" Having actual details would probably be pretty grounding.
Mazes as "a problem for institutional Effective Altruism™"
Originally I was interested in Moral Mazes from the perspective of building EA, Longtermism, or or other nearby organized efforts to accomplish important things at scale.
I touched on this in the first section. But I wanted to spell out some of the concerns here.
The Middle Manager Hell hypothesis (my name for a subset of Moral Maze theory) is: A problem with middle managers is that it's hard to evaluate their work, which makes it tempting to evaluate them in goodharty ways, or de facto for evaluations to run through some manner of politics and network effects. This gets much worse when there's enough layers of managing that the managers interact with each other to build their own internal ecosystem that's untethered to object-level-work.
We also face this problem with longtermist research. You could cheekily describe longtermism as "believing that the most important things have the worst feedback loops." If longtermist EA is to scale it's impact, it needs to deal with many human-elements of "wanting money/reward for doing good work, a sense of how to progress/gain-status, etc". Building an ecosystem that effectively rewards good work is important, but it's hard to tell what "good work" means.
Longtermism-at-scale both might require middle managers and researchers in topics that are hard to evaluate, which could add up to a doozy of a hard time for healthy-ecosystem management.
Some problems might include:
- There are many individual organizations, which may scale, and become more maze-y over time.
- The ecosystem collectively sort of acts like an organization, and it's unclear how this plays out. In the mix of community organizers, managers, organizational grantmakers, and re-grantmakers (i.e. sometimes one org grants money to another org to re-grant), you may end up with the same "management class" that is increasingly untethered from the object-level-work. A worry might be status ending up getting allocated via passing favors around rather than demonstrating skill/good-judgment. And this might lead to a collective maze-atmosphere.
- A lot of people seem to be following a strategy of "gain power/prestige within mainstream institutions, so they can use that power to do good later." Those mainstream institutions might be mazes, and those people might end up self-modifying into an increasingly maze-like outlook which filters back into EA.
It's worth noting a distinction between "mazes" as a corrupting influence from the mainstream world, vs other forms of corrupting influence. Moral Mazedom is one particular theory. But, in some cases it doesn't seem necessary to explain what's going on. "Some EAs care about prestige and the brand of EA, which makes them hesitant to talk openly about important problems" could be explained by things other than mazedom.
I think one key question here is "how do you actually do longtermist research?", which is not a small question. This includes (but is not limited to) how to do preparadigmatic research, which has some intrinsic difficulty regardless of how you've tried to organize your meta-community.
If you don't know how to do that, then "how to middle manage longtermist research" is even more confused.
That was a lot. Here's a recap of some pieces that felt important:
- Getting from "interesting hypothesis" to "vetted knowledge" is expensive, and not necessarily worthwhile. Reflect on what information is actually decision-relevant.
- Civilizational (In)adequacy seems it should be a major crux for various high level EA strategies – for EA institutional infrastructure, AI strategy, and a common cause among causes.
- Maze Theory suggests a mechanism by which dysfunction is spreading.
I feel a bit sad that in each section, I only have vague inklings of what further work is actually useful. Each lens for "why (in)adequacy matters" felt like it warranted a whole other post. There's a lot of work left to do here.