This was written for the LessWrong Review, intended as a high level exploration of the 2020 posts on Moral Mazes, Simulacra, Coordination Theory, and Civilizational Adequacy.


I. What Makes For a Useful Review?

When reviewing LessWrong posts, fairly often my reaction is "Cool hypothesis bro, but, can we actually go out and verify it?". A lot of LessWrong posts explore interesting ideas, and those ideas match my intuitions. But lots of intuitive ideas turn out to be wrong, and sometimes people have different intuitions. It seemed good to have a step where we actually turned armchair philosophy into empirical claims.

A year ago, I began two attempts at investigating some claims:

  • Eliezer's conjecture that Mathematicians were more law-abiding that non-mathematicians (controlling for IQ).
  • The overall Moral Mazes sequence.

Mathematicians and "Is this data worth the expense?"

Elizabeth and I made some attempt to run a Positly study on math / law-abidingness. We quickly ran into "Running a real, legit study is a pretty big time cost." (We ran one quick-and-dirty study that essentially boiled down to asking people 'are you good at math?' and 'do you do crimes?', which Eliezer specifically called out as Not A Good Approach in his post. We tried looking for datasets that might be relevant to our interests, but finding and sifting through them was hard).

Ultimately I faced the question: "is this question actually worth investing this effort in?" and I felt the answer was "probably not." It seemed vaguely informative, and I felt a nagging annoyance at publishing Local Validity As Key To Sanity and Civilization without having investigated it. But after a lot of reflection (on the timescale of years), I came to the belief that not every claim is worth investigating. 

If I was worried about the epistemic integrity of the LessWrong books, a much cheaper approach would be to include some metadata about the epistemic status of each post, or suggestions of future work that signposted which areas warranted more investigation. 

Moral Mazes, and "What are we actually trying to figure out?"

The Immoral Mazes sequence raised a lot of interesting hypotheses, which matched my intuitions/experience. But it seemed to me the next step was do form some actual predictions and go out and test them.

At the time, I was thinking about Mazes largely from the standpoint of "I want to build a rationalist/EA/longtermist ecosystem that accomplishes good at scale. How do we do that, without succumbing to Mazey pressures?". Example of questions I wanted to test were: "Do more layers of hierarchy reliably lead to more Mazeyness? How reliably? What observations do I actually expect if an org is 'mazey'? Does it help if an org is broken into smaller pieces or does that just shuffle around the problem? How do we evaluate middle managers?" 

Testing these things would not be a cheap undertaking. But unlike the Mathematician example, it actually seemed important.

I went to chat with Oliver Habryka about it, and he noted (paraphrased) "For building an organization, I'm not sure it's actually all that useful to study how exactly things fail. I'd much rather see a bunch of case studies of how things succeeded. What types of orgs produce value, and how do they go about it?"

This was an interesting takeaway for me – when reviewing posts, the most important takeaways don't necessarily look like "evaluate specific claims". The important thing might be "figure out what decisions this post bears on, use the post as a springboard/pointer to help figure out what you actually need to know.

In my ideal world, there's funding and bandwidth to followup on lots of claims, fleshing out our understanding. Realistically, that bandwidth is limited. I think it makes sense to prioritize questions that are important enough that you'd actually be willing to pay money for the answers (or, questions where people are naturally driven by curiosity or righteous "someone is wrong on the internet" to get the answers).

II. The Civilizational Adequacy Cluster

With  all that in mind, when I look at the posts from 2020, there's a cluster of themes that fit together. 

Some of them clump fairly "obviously":

In my mind, these connect into a fuzzy question of "What actually is up with civilization? Is it functional? Is it trustworthy? Is much of it a corrupt, immoral mess? Can/should we cooperate with it? If not, what do we do instead?"

Theoretical Underpinnings

Perhaps less obviously, I also think our work on coordination and game theory fits into this cluster. Some of these are approaching the question from a different angle, but ultimately help me build a unified map of "what's up with civilization?" Some concrete concepts I took away last year:

Schelling Problems (as opposed to PD or Staghunt)

Abram's Most Prisoners Dilemmas are Stag Hunts, most Stag Hunts are Schelling Problems was a crisp conceptual update for me. For the first time I felt like I could look at realworld problems and have a game theory abstraction that worked at multiple levels, where I could see all the way down into the math and it made sense. This dovetailed with Coordination as a Scarce Resource to me give a sense of why coordination problems are hard, but lucrative to solve.

Simulacrum 3 as Stag Hunt Strategy

Another conceptual update was some clarity on what Simulacrum 3 / Belief-As-Attire is for. We've been using belief-in-belief as a coordination mechanism for getting into better equilibria, and we actually need better alternatives.

Perhaps least obviously, Critch's Some AI Research Areas And Their Relevance To Existential Safety. Much of Critch's agenda seems to explore "how exactly is civilization going to coordinate in a way that preserves human values, when you have tons of AI running around making decisions and executing plans?". 

(But, be careful with game theory)

My thinking here is also cautioned by Elinor Ostrom's book Governing The Commons, which argues that simplified game theory models in fact are pretty misleading and you need to actually model the actual situation which has tons of nested sub-games in order for game theory to get you "the right answer." This seems true. I nonetheless feel like these two posts ratcheted forward a clear understanding in me, and I have some hope that eventually the theory can turn into real illumination.

III. Lenses for "Why (In)adequacy matters"

Civilization is huge. "What's up with civilization and why does it sometimes do things that seem crazy?" feels like it ought to be important to a lot of decisionmaking. 

I had previously spent a bunch of time thinking through the lens of "what kinds of orgs do I want to build, and how do I make sure they don't become corrupt?". But there are other lenses. 

Moral Mazes and Civilizational Adequacy as Cause Area

Zvi is notably not an Effective Altruist, but when I translate (Im)moral mazes into the EA framework, one obvious followup thought is:

"Hrmm. If the world is infested with Mazes, maybe this is horrifying and bad and is maybe worthy of consideration of a top EA cause?" 

Two different reasons you might think this:

Crippling Civilization. The maze nature destroys an organization’s ability to communicate clearly and accomplish their original goals. Mazes try to turn other orgs they interact with into mazes, so this problem may be afflicting large swaths of humanity. This may have significantly harmed our ability to respond to covid effectively, and to other complex 21st century problems. The “moral mazes” problem is bad because it cripples civilization as a whole.

Bad for the People Involved. The maze nature is harmful to the welfare of individuals who end up trapped within it. People are encouraged to devote everything (their family life, their hobbies, their free time) to the organization. The organization doesn’t even care about its own ostensible goals. Only a teeny fraction of people who make it to “the top” get any kind of reasonable payoff for their sacrifice. Since mazes try to create more mazes, this means a large swath of humanity is afflicted by soul-sucking machinery.

Bad for the People Involved. Or, "Classic Givewell-style EA"

Point #2 basically amounts to some cluster of "People are suffering, people aren't achieving their potential, living the fulfilling lives they could be." And there's some math about whether the scope of this problem, and the tractability of solving it, are comparable to people suffering/not-achieving-potential because they are killed by malaria or whatever. 

I don't actually know the answer to the math. There's a question of to what degree moral mazes are pervasive, and how unhealthy they for the median person caught up in them. Then there's a question of what sort of interventions work and how effective they are. It's hard to do 'vanilla' EA math until there's at least some people who've tried solving the problem, whose progress can be evaluated. 

From this perspective, a next step might be "thinking more about the scope of the problem, and what might help." 

Crippled Civilization, or "Maybe humanity's hamming problem?"

Point #1 seems like a strong contender for "humanity's biggest problem." It might or might not be tractable, but if true, it affects all other problems we might need to coordinate around. 

"Moral mazes" is a potential subproblem of "civilization just doesn't seem to be able to reliably do sensible-things-at-scale." And here is where a lot of the other "Civilizational Adequacy post cluster" come in. There seem to be multiple problems at play, all the problems seem gnarly and hard with a lot of entrenched political interest. 

I'm not sure I trust a lot of the evidence that has been reported to me about "how functional is civilization?". Information about this is heavily politicized, and anecdotes that reach me are selected for being outrageous. Still, I've gotten a lot of bits of info that seem to at least there's a ton of room for improvement here.

Here, some obvious questions are: Does the moral mazes framework suggest solutions to 'civilization not being able to act sensibly at scale'? Do simulacrum levels?

My naive thoughts all go in the same direction as my original query "how can I avoid creating moral mazes, or minimize their damage/degree?". Oliver's suggestion seems potentially still relevant: "look for successful organizations, and parallels between them, rather than spending a lot of time thinking about failure." Only in this case it's something like "Look at the places where civilizations succeeded. Maybe particularly look for places where a civilization's competence sort of wavered and recovered." (I'm wondering if this is a good job for Jason Crawford)

But I'm interested in followup work that explores entirely different avenues. "Look at success" is harder when the topic is macro-history, and there are probably other options. 

Civilizational Adequacy as AI-Strategy Crux

The previous points were mostly things I thought about last year. But this year, as I chewed on the MIRI 2021 Conversations, it struck that there's a different application of the "is civilization terrible?" question.

It seems to me that whether one believes civilization is horribly inadequate has a pretty big impact on one's AI strategy.

If you think our civilizational apparatus is basically functional, or close to functional, then it looks more like a reasonable approach to base your AI macro-strategy on government policy, agreements between companies, or changing the longterm AI research institutional culture. 

If you think civilization is dysfunctional, you're more likely to think you need a pivotal act to permanently change the gameboard, and that all the other stuff won't help.

(I don't mean to imply this is the only major crux, or even the most important one. But it seems like a fairly big one)

In many areas, I don't think it's that bad that the LW / AI Risk community struggle to reach convergence. There's a lot of research ideas, and it's maybe just fine to have lots of people pursuing their own agendas in parallel while thinking each other's research agendas are doomed. The fact that we can't agree is evidence that at least one of us is thinking wrong, but the best group-epistemic strategy might not be to try to enforce convergence.

a) I think disagreement about this is a pointer to something that (I think) should be important to people's individual epistemic journeys

b) the disagreement is particularly annoying because it points in fairly different directions on what one is trying to do with an AI Alignment community, which leads to some plans working at cross-purposes, with people vaguely annoyed or embarrassed by each other.

Unfortunately, this seems like a topic that is some combination of political and aesthetic, and easy to collapse into a "yay/boo civilizational adequacy?" question rather than making specific predictions. When I imagine trying to get other people to engage seriously with the topic I feel some despair. 

I think there's one set of biases that color people's thinking by default (i.e. generally having a hard time distinguishing social reality from reality, going funny in the head with politics, being vaguely optimistic about things, etc).

...and another set of biases that LessWrong folk tend to have (selected for not playing well with others, being extra annoyed at things that don't make sense according their strong inside view, etc)

I'm still confused about how to think about this. I think the most useful next action for me to take is to try to think about this for myself, without worrying about whether anyone else agrees with me or is thinking about it the same way. 

I feel drawn to questions like:

  • Which parts of civilization are the relevant parts for coordinating on AI deployment?
  • How can I get unbiased samples of "what actually happens when the relevant pieces of civilization try to coordinate at scale?"
  • How quickly do arbitrary AI companies adopt best practices? If good alignment technology is developed, but in a different paradigm than whatever Deepmind etc are using, would Deepmind halt its current projects and switch to the new paradigm?
  • How lumpy is progress? I feel confused about Eliezer and Paul's debates about how smooth the economy actually is in practice.

But maybe the best question is "what is the actual best end-to-end plan that routes through large-scale coordination, vs what is the best end-to-end plan routing through pivotal-act-type interventions?" Having actual details would probably be pretty grounding. 

Mazes as "a problem for institutional Effective Altruism™"

Originally I was interested in Moral Mazes from the perspective of building EA, Longtermism, or or other nearby organized efforts to accomplish important things at scale.

I touched on this in the first section. But I wanted to spell out some of the concerns here.

The Middle Manager Hell hypothesis (my name for a subset of Moral Maze theory) is: A problem with middle managers is that it's hard to evaluate their work, which makes it tempting to evaluate them in goodharty ways, or de facto for evaluations to run through some manner of politics and network effects. This gets much worse when there's enough layers of managing that the managers interact with each other to build their own internal ecosystem that's untethered to object-level-work.

We also face this problem with longtermist research. You could cheekily describe longtermism as "believing that the most important things have the worst feedback loops." If longtermist EA is to scale it's impact, it needs to deal with many human-elements of "wanting money/reward for doing good work, a sense of how to progress/gain-status, etc". Building an ecosystem that effectively rewards good work is important, but it's hard to tell what "good work" means.

Longtermism-at-scale both might require middle managers and researchers in topics that are hard to evaluate, which could add up to a doozy of a hard time for healthy-ecosystem management.

Some problems might include:

  • There are many individual organizations, which may scale, and become more maze-y over time.
  • The ecosystem collectively sort of acts like an organization, and it's unclear how this plays out. In the mix of community organizers, managers, organizational grantmakers, and re-grantmakers (i.e. sometimes one org grants money to another org to re-grant), you may end up with the same "management class" that is increasingly untethered from the object-level-work. A worry might be status ending up getting allocated via passing favors around rather than demonstrating skill/good-judgment. And this might lead to a collective maze-atmosphere.
  • A lot of people seem to be following a strategy of "gain power/prestige within mainstream institutions, so they can use that power to do good later." Those mainstream institutions might be mazes, and those people might end up self-modifying into an increasingly maze-like outlook which filters back into EA.

It's worth noting a distinction between "mazes" as a corrupting influence from the mainstream world, vs other forms of corrupting influence. Moral Mazedom is one particular theory. But, in some cases it doesn't seem necessary to explain what's going on. "Some EAs care about prestige and the brand of EA, which makes them hesitant to talk openly about important problems" could be explained by things other than mazedom. 

I think one key question here is "how do you actually do longtermist research?", which is not a small question. This includes (but is not limited to) how to do preparadigmatic research, which has some intrinsic difficulty regardless of how you've tried to organize your meta-community.

If you don't know how to do that, then "how to middle manage longtermist research" is even more confused.

IV. Takeaways

That was a lot. Here's a recap of some pieces that felt important:

  • Getting from "interesting hypothesis" to "vetted knowledge" is expensive, and not necessarily worthwhile. Reflect on what information is actually decision-relevant.
  • Civilizational (In)adequacy seems it should be a major crux for various high level EA strategies – for EA institutional infrastructure, AI strategy, and a common cause among causes.
  • Maze Theory suggests a mechanism by which dysfunction is spreading. 

I feel a bit sad that in each section, I only have vague inklings of what further work is actually useful. Each lens for "why (in)adequacy matters" felt like it warranted a whole other post. There's a lot of work left to do here.


New Comment
14 comments, sorted by Click to highlight new comments since: Today at 12:07 PM

I think Moral Mazes is a misleading meme that itself contributes to the problem and would much prefer splitting off the more specific purposes like Liability Laundering. I don't recall to what extent the original sequence covers that such mazes are not a bug but a feature for upper management.

This statement is kinda opaque and I'd like it if you spelled out your arguments more. (I realize it's not always worth the effort to wade into the fully argumentation, but a point of the review is to more-fully hash out arguments for posts. Road To Mazedom ranked at #19 during the preliminary voting, so if there's disagreement about it I think it's good to spell out).

(I don't necessarily disagree with your claim, but as worded it doesn't really convey anything beyond "Romeo thinks it's misleading")

I'm not sure how to say it in another way. Moral mazes are selected to exist by management because they disperse liability for decisions that benefit management while harming employees, customers, and (often) shareholders. Maybe not 100%, some of their instantiation details I'm sure are just spandrels. But they have a purpose and fulfill that purpose.

Am roughly in middle management. Can confirm. Basically I and everyone around me is trying to walk some line between take enough responsibility to get results (the primary thing you're evaluated on) but don't take so much that if something goes south you'll be in trouble. Generally we don't want the pain to fall on ICs ("individual contributor" employees whose scope of responsibility is ultimately limited to their own labor since they need sponsorship from someone else or a process to commit to big decisions) unless they messed up for reasons within their control.

I generally see the important split as who is responsible and who is accountable. Responsible means here something like "who has to do the work" and accountable means something like "who made the decision and thus gets the credit or blame". ICs do well when they do a good job doing whatever they were told to do, even if it's the wrong thing. Management-types do well when the outcomes generate whatever we think is good, usually whatever we believe is driving shareholder value or some proxy of it. ICs get in trouble when they are inefficient, make a lot of mistakes, or otherwise produce low quality work. Management-types are in trouble when they make the wrong call and do something that produces neutral or negative value for something the company is measuring.

Basically I think all the maze stuff is just what happens when middle management manages to wirehead the organization so we're no longer held accountable for mistakes. I've not actually seen much serious mazes in my life because I've mostly worked for startups of various sizes, and in startups there's enough pressure from the executives on down to hold people accountable for stuff. I think it's only if the executives get on board with duping the board and shareholders so they can wirehead that things fall apart.

I meant to be referring to "I think Moral Mazes is a misleading meme that itself contributes to the problem". Why is it misleading? Why does it contribute to the problem? What evidence or reasoning leads you to believe that?

It's the thing where an extreme mistake theorist comes up with epicycles that are increasingly implausible to explain why the system in question is layered mistakes rather than the much simpler one that it is a conflict. 'moral mazes' implies complexity and connotes the undecidability of morality.

Everyone I've seen be really into moral mazes as a concept takes a strongly conflict-theory approach to systems they believe are mazes. That the presence of the epicycles means there is no working with it and the whole system must be burned.

Now that I think about it, much of the purpose of the label could be to allow one to use conflict theory modalities on a system that won't acknowledge (and may genuinely not see) the conflict. Zvi's Out To Get You posts certainly seem to be that.

(Fwiw, I thought I knew what you meant with your top level comment, but this elaboration wasn't what I expected and is much more interesting).

I'm not sure I trust a lot of the evidence that has been reported to me about "how functional is civilization?". Information about this is heavily politicized, and anecdotes that reach me are selected for being outrageous.

How easy it is to maintain and improve the civilization, seems to be a fundamental political question. To put it simply, "civilization is super difficult, we should be grateful for the one we have and protect it at all costs" seems like the essence of conservatism; and "civilization is quite easy, we just need to destroy the current power structures and it will automatically rebuild in much better way" seems like the essence of progressivism.

How functional is the civilization, that is a bit more nuanced. A conservative might say something like "about as functional as we can reasonably expect (which is not perfect, because making an imperfect thing is already quite difficult), minus the things recently introduced by the progressives"; while a progressive might say something like "the things we have introduced recently are great, but there is still a long way to go".

So in an ironic way, both would agree that we are kinda in the middle, but for very different reasons (in the middle of dismantling vs building the civilization). Similarly, both would provide outrageous anecdotes, but of a completely different type; and both would object against anecdotes provided by the other side.

This was on the long side, covering a lot of points. I'm curious to get feedback from people who ended up skimming and bounced off about where they did so.

This is the first time I've ever been specifically encouraged to chime in for skimming, so I'd be crazy not to take you up on it!

Basically, I read this post, and the posts it links, as a bunch of non-sociologists realizing that a study of the sociology and psychology of organizations might be important, but then deciding to reinvent the discipline instead of engaging with it. My main concern is with tractability of understanding the problems of organizations and of finding neglected solutions. This set of posts doesn't particularly address these tractability concerns. I'd also like to see deeper engagement with extant literature and empirical evidence on the subject of how to analyze or improve organizational structure.

In general, this line of thinking seems to point in a sort of waterfall model of organization-building, in which we first develop a well-vetted theory of organizational design, and then use it to replicably build high-functioning organizations.

It seems to me that most organizational success stories I'm aware of involved a small group of smart people getting together in a room with a vision and figuring it out as they went along. 

Curious if you have particular exemplar posts from sociology.

Periodically I see people make the "I'd like to see you engaging with the mainstream literature" comment, but then, well, there's a whole lot of mainstream literature and I'm not sure how to sift through it for the parts that are relevant to me. Do you actually have experience with the literature or just an assumption that there's stuff there? (if you are familiar, I think it'd be great to have an orientation post giving an overview of it)

Moral Mazes is a book by a sociologist, but the topics in this theme just strike me as being basically sociological in nature. The study of social structures.

I’m not a sociologist, so no, I don’t have recommendations. But it’s very hard for me to imagine there’s nothing out there.

I guess my comment may have come off as a criticism or challenge, but that’s not what I intend. It’s just pointing out that sociology is the relevant field, and these posts are being written by folks who don’t have a strong background in those academic fields - although they may have canny personal insights from their own experience. It seems to me like the most tractable next step in terms of vetting is not to try and run new experiments, but to find out what kind of evidence already exists. Or to get some skin in the game and try it for oneself!

Again, that’s not a criticism or challenge - just my own thoughts after having spent a fair amount of time reading the posts you referenced here and reflecting on my experiences.

[+][comment deleted]1y 3

We also face this problem with longtermist research. You could cheekily describe longtermism as "believing that the most important things have the worst feedback loops."

It seems to me like this mindset exists too much in longtermism. The current pandemic is feedback that previous biosafety approaches didn't work and it would be possible to react.

New to LessWrong?