Sawtooth Problems

Alexander Slugworth

Red Button, Blue Button

On April 24th, 2026, Tim Urban put forth the following poll on Twitter/X:

Everyone in the world has to take a private vote by pressing a red or blue button. If more than 50% of people press the blue button, everyone survives. If less than 50% of people press the blue button, only people who pressed the red button survive. Which button would you press?

I love this dilemma, and I'm exhausted by it. I’ve been thinking about it for two straight weeks, and have spent nearly all that time refining my thoughts by writing this piece. It's consumed me in a way that I've never before experienced with any math problem, and I need to get it out of my head.

Discourse surrounding the Button Dilemma reminds me of polarizingly political topics. In much the same way that political discussions make people go funny in the head, answers to the Button Dilemma tend to elicit vitriol from people of both Red and Blue conviction. Everyone feels their answer is clear, and everyone is confounded by the lack of consensus.

I think this dilemma is pointing to something very important and fundamental about coordination problems. The variety of answers it receives, the arguments, and the emotional fervor feel raw in a way that I've never seen so cleanly extricated from political policy arguments.

There's something very strange happening inside the Button Dilemma's universe, and the more I stare at it, the more I find.

I'd like to share some of my observations.^[1]

What Are We Even Arguing About?

I see a common thread whereby the Button Dilemma is approached as a sort of lateral thinking puzzle. Is everyone awake and lucid? Are blind people forced to pick at random? How do we account for people who don’t have hands and can’t press buttons? Are children expected to choose a button? Well if children are involved, then I’m definitely choosing blue!

I endeavor to omit these sorts of questions from the exercise we're about to walk through.

If you find yourself wondering about questions like these, I encourage you to reinterpret the scenario such that your concern has been nullified. All button-pressers are lucid. Blind people are provided with braille signs that describe each button unambiguously. Children do not participate.

The answer to the Trolley Problem is not “what if the five people are terrorists," and the answer to the Button Dilemma is not “what if some participants are children.”

In exploring most philosophical or mathematical problems, the act of reframing can often help guide us toward improved mental models. When we reframe a problem, we're trying to see it from different angles, and one of the easiest ways to reframe a problem is to consider it in a real world context.

I think it might be the case that when you reframe the Button Dilemma, you fundamentally change the question being asked.

This made me uncomfortable at first. On the Button Dilemma, I feel myself a strong, confident Blue. I want to coordinate, and I'm predisposed toward expecting that others want to coordinate by default. I have a Blue prior. I want everyone to live. Some of my friends might press Blue, but that almost doesn't matter: I just want everyone to be okay, and pressing Blue seems like the only realistic way to achieve a guarantee that everyone will be okay.

Then I caught a glimpse of what it looks like to want to press Red.

If you are Red in the Button Dilemma, I expect this to feel very familiar. If you know in your heart that you are Blue, and will always be a Blue, I invite you to join me in feeling what it's like to be on the Red team.

Abandon the button.

Everyone in the world is teleported into their own private room containing a human-sized blender. If >50% of people enter their blender, everyone survives. If <=50% of people enter their blender, only people who stayed out of their blender survive.

Will you enter your blender?

This feels like an isomorphic re-framing of the Button Dilemma. It asks the same question: will you risk your life, that others may live? Or will you preserve your own safety, and hope your loved ones join you in survival? Stepping into the blender is Blue; staying out is Red.

Philosophically it seems to be the same kind of question, but it feels very different for some of us Blues. There's something obvious about this choice of Red. All of a sudden, Red no longer feels like defection.

Of course you don't get into the blender, right? And your loved ones won't either, right? It would be stupid to get in the blender. We're all coordinating on not-the-blender, right everyone? Right? Why would anyone get in the blender?

In spite of everything, Red starts to feel like a kind of coordination. In the back of your mind, you nervously expect there to be some death, but hopefully not very much. Surely not very much.

You don't want death. You really don't. But you just can't bring yourself to accept that we could possibly be coordinating on stepping into the blender. Stepping into the blender is practically committing suicide.

Then the final outcome is revealed, and you discover that 58% of people stepped into the blender.

You feel a sort of weird astonishment. In one sense you're relieved, because everyone survived! But then the ugly, strange fact hits you with full force: 58% of your peers chose to step into a fucking blender. Worse still, a bunch of them are now asking why you willfully risked their lives by staying out of the blender. They want reassurance that you've learned your lesson, and you'll step into the blender next time. "So long as we all do it, it's safe," they say.

But this does not feel safe. It feels like a threat. You don't want to go into the blender. Why on Earth would you ever do that? The blender is a risk. The blender is death.

Oh my God.

This was the experience that broke me, broke my sense of confidence in my own morality.

With the flip of a switch, I had become Red. I could flip it back, too. In the Button Dilemma, I press the Blue button, and I look over at those who press Red with deep suspicion and fear. In the Blender Dilemma, I stay out of the goddamn blender.

It feels hypocritical in an egregious kind of way. It's like I'm lying to myself, and I can see that I'm lying but I can't stop. It feels like going insane.

Earlier I said "some of us Blues" give different answers for the blender and the button, but I'm not sure how many like-minded Blues feel as bothered by this unresolved hypocrisy as I do.

So, I've set out to resolve it.

My central thesis can be summarized like this: I think I've found an undescribed and extremely useful way to group and characterize a larger category of coordination problem from which the Button Dilemma emerges, and I think it might be important.^[2]

A Fair Way To Look At It

If we want to talk about the general kind of coordination problem that the Button Dilemma seems to be involved with, we need a way to see the world it lives in. What is it really asking us, and are there non-obvious ways to reframe this class of problem while retaining its crux—without turning a button into a blender? Are we actually trying to coordinate, here? Is coordination Blue, and defection Red? Does it always benefit us to coordinate?

Approaching these questions from a neutral standpoint is super difficult, especially when the various possible outcomes tend to get described as e.g. "[Blues/Reds] are taking an unnecessary risk! Every death is the fault of [Reds/Blues]!"

Nevertheless we must try, or we won't ever move beyond the object-level formulation.

In moments like this, it can feel tempting to reach for your favorite decision theory.
Ol' reliable. Never lets you down.

Please, put all of that aside for a moment. I promise I'm not trying to attack anyone's decision theory. I just want to draw attention to some strange parts of the territory you're trying to make decisions within.

Here's my attempt at formalizing the Button Dilemma:^[3]

Given participants and threshold , each participant selects or .
Let be the fraction selecting .
If , everyone survives.
If , exactly those who selected survive.

Let's graph this. We'll examine a model population of :

I call this a sawtooth graph, named for the shape of the contour outlined by the bars. Sawtooth graphs are visualizations of sawtooth problems.

Each vertical bar illustrates a possible outcome. If 10 participants select the Blue Button, then 50% of all participants survive. If 11 select the Blue Button, then all 20 participants survive.

The graph's left side represents every possible Red victory—all outcomes where .

The graph's right side represents every possible Blue victory—all outcomes where .

Notice that two different kinds of strategy yield a survival rate of 100%:

The first case was already colored blue, and comprises the large plateau on the right-hand side of the sawtooth. Here's what it looks like with colored blue as well.

"Wait, this isn't a fair representation," I hear Reds object. "From our perspective, we also want a survival rate of 100%!"

Okay, I agree. This does feel a bit unjust, for that exact reason. Here's a third color scheme, depicting Red's preference for 100% survival:

Well.

This doesn't look quite right either, does it?

Now every outcome is colored red, as though the Reds are responsible for literally every outcome. This is just obviously wrong.

We could keep cycling through these. There's a color scheme that depicts 100% survival in red and <100% in blue, and there's one that assigns responsibility for every single outcome to the Blue team. We could also swap the colors, with blue on the slope and red on the plateau.

Let's not do any of this. All of these graphs feel like unfair representations, and I posit that the unfairness is born from insufficient nuance.

Capturing it will unavoidably introduce some visual noise, but I can't find a better alternative. Let's see it:

We can now visually distinguish between survival rate and survival breakdown, which brings us quite a bit closer, but there's something still missing. In this graph, we see Red's component of every column, but we only show Blue's component for the outcomes Blues claim as coordination.

Here's how Blue contributes to the left side of the threshold:

This graph clearly identifies:

The Blues who succeed in bringing both themselves and the Reds over the threshold
The Reds who survive a Blue coordination failure, and the Blues who failed^[4]

Here's another useful way to represent the data:

This color scheme isolates the survival rate. Red and Blue are mixed into Purple, identifying only the boundaries between life and death. This is a less-noisy, lower-information way to fairly display the possible outcomes of a sawtooth-shaped coordination problem.

I'll generally try to use the noisy red-and-blue color scheme when I think the nuance is important for what I'm trying to communicate, and the purple color scheme when I think it's probably not.

Playing With τ and N

Now that we've settled on a way to visualize sawtooth problems, let's get back to playing with the formalization. Recall:

Given participants and threshold , each participant selects or .
is the fraction selecting .
If , everyone survives.
If , exactly those who selected survive.

Notice that this permits arbitrary values for the threshold τ.

How would you vote if the threshold was 80%?^[5]

These are just Twitter polls, but between Urban's 50% and Aella's 80%, the contrast in outcomes highlights an interesting mechanism underlying this class of coordination problem.

Below the threshold, Red survivors behave as a kind of safety fallback for Blue coordination failure, and there is an inverse relationship between how close the final tally is, and how much room for error the fallback gives us. If Blues just barely fail to pass the threshold of the 80% problem, only 20% of the participants survive.

I think this might be a different way of describing a false crux hiding inside arguments about the Button Dilemma. One's selection of button acts as a nudge toward either end of the graph, and it looks like Reds nudge us toward more safety in the event of Blue failure, and Blues nudge us toward a safer margin for avoiding Blue failure.

Both of these feel like good arguments in favor of either Red or Blue.^[6]

Let's look at one of the simplest sawtooth problems, with N=2.

You and your best friend have to take a private vote by pressing a red or blue button. If >50% of you press the blue button, you both survive. If <=50% of you press the blue button, only people who pressed the red button survive. Which button would you press?

How well do you know your best friend? One of your lives is at stake. Can you predict which button your friend will press? Can they predict which button you will press?

This seems very much like a pure coordination game.^[7] It has two possible outcomes: either both players live, or only one player lives. If each player chooses at random, there is a 50% chance one of them will die.

Notice that there is no defection here, only coordination. Neither party wants the other party to die! This feels like some worse version of Sophie's Choice where her children have to flip a coin to decide whether they both live or one of them dies.

But in this sawtooth problem, you can guarantee your own survival. You can just put heads face-up, and choose to live. They can, too. For those of you who choose Red in this framing, you might feel that it's obvious you just put your coin face up, and it should be obvious to them, too!

But it might not be obvious to them, and perhaps you can save them. They might even be thinking the same thing.

For tacit coordination in the real world, we don't typically choose at random.
We strategize. We try to model each others' minds, and our decisions try to account for our best guess at what we think the other party might choose.

Do you and your best friend know each other well enough to ensure you both survive? ^[8]

Extreme Sawtooth Problems

These next two sawtooth problems have respective τ values of 99% and of 1%.
I've set N=100 to smooth out the shapes and emphasize the dynamic:

These graphs show us the smallest and largest possible safety margins for Blue coordination failure among any value of τ.^[10]

When I think about the world hidden behind the coordination failure in that last graph, I imagine the surviving 99 Reds reacting with collective horror and astonishment. "How could we have failed our fellow in this way? We each chose to live, that all of us may live. What happened?"

Let's push these extremes one step further.

The graphs are visually trivial, but they are telling us something!
Look at the coordination problems they imply. Look! ^[11]

Look at the obvious and avoidable danger we're forced to contend with at τ=100%. For crying out loud, look at its survival ratio when there are 100 Blues: if everyone votes Blue, everyone dies. Come on!

Compare this to the utopian miracle of τ=0%. It's safe under every condition! It describes a sector of reality within which everyone lives, no matter our individual choices.

With these values for τ, the sawtooth has split itself in twain and given us two unique kinds of problem wherein:

Your choice will impact only yourself
Your choice will impact nobody

We've somewhat seen both of these graphs already. They were here all along, hidden inside one of our N=20 representations of the Button Dilemma from earlier:

Axiomatic Expansion of Sawtooth Space

What happens if I do this:^[13]

This graph shows something new, capping the Red survival rate at 50%.

Up until now, we've been looking at sawtooth problems with different values of τ.

But the graph is under our control. We need not be bound by the strictures of a single parameter, which means we can depict whatever we like. I've introduced something new, and will elaborate in a moment. First, let's look at the graph.

In this sawtooth problem, pressing the Red button grants you a 50% chance of survival. Selecting Red now represents a risk.

Does it make sense to take this risk? Maybe so, but now you have to start thinking about expected value. If you choose Red in the Button Dilemma—if you have a Red prior—and if you generally expect the odds of Blue success to be less than 50%, I expect you to remain Red in this sawtooth problem.

If, however, you think the Blues might successfully coordinate, even by a small margin, then the odds have tipped beyond every sane interpretation of this specific sawtooth, and I argue that you should then join the Blue team. Red subjects your life to a coin toss. If the odds of Blue success are >50%, then even pure selfishness would still demand that you choose Blue.

(Edit: the math in this section is wrong! See footnote!)^[14]

There exists an inverted variant:

Things are beginning to look a bit strange.

In this sawtooth problem, the maximum possible survival rate has been pushed downward everywhere above the Threshold, bottoming out at 50%. Here, there is no good reason to vote Blue.

We don't see the same kind of expected-value calculation that we saw with the previous graph. It is obvious from every angle: in this sawtooth, you must coordinate around Red.

These two graphs represent the same magnitude of a new, signed parameter that behaves similarly, but only on either end of the graph. We'll call it solidarity ().

Because we're having fun, let's go wild and give ourselves just one more parameter:

These two graphs introduce a new sort of cliff which behaves much like the threshold τ, but instead of acting as a barrier between the two color choices, it punishes strategies which lean too hard toward either Blue or Red. The sawtooth problem contains an intolerance for too many similar choices at one end of the spectrum vs. the other.

So, let's call this parameter, which is also signed, intolerance (I).

The Map

We now have three parameters to play with, and we're beginning to model sawtooth problems which look increasingly dissimilar to the original Button Dilemma.

This might be a good time to pause for a moment and think about what it is we're actually doing here.

Our Parameters

We've defined three forces that can shape our sawtooths.^[15] Let's review them:

Threshold (τ)
- The location of abrupt regime change, bound to the success or failure of Blue's attempt at coordination.
Solidarity (σ)
- Solidarity imposes an upper limit to the number of survivors on one side of the threshold by scaling survival down to
Intolerance (I)
- Intolerance acts as a sort of secondary threshold. On a graph, this has the result of eliminating entire survival outcomes completely, with the number of impacted bars equal to

Positive values for and impact outcomes below and at the threshold.
Negative values impact outcomes above the threshold.^[16]

As a reminder, τ is the only parameter I extracted directly from the Button Dilemma. I'm just axiomatically declaring these other parameters to exist because they let us explore interesting hypothetical dynamics of coordination. It's not clear to me that every such coordination problem can manifest in reality.

For example, "a Blue majority is the best outcome only below some threshold of intolerance" is a very strange shape of coordination problem that I struggle to match against anything in the real world.

But.

But.

"A Red majority guarantees survival only above some threshold of intolerance" feels like it may reflect real-world coordination problems, and it can be intuitively derived from its sawtooth graph.

We'll use the graph for that problem below, as we orient ourselves within this space.

Regions of the Map

Many of our graphs share certain features in common. Let's identify them.

While the three parameters still feel fresh, and with my hope that this exercise may serve as an intuition pump for something useful, I'll be showing the parameter values for each sawtooth graph from now on.^[17]

This is the map of sawtooth problems with these three specific parameters.^[18]

Two regions permit survival:

The Slope (i.e. Red survival, Blue coordination failure)
The Plateau (i.e. Blue coordination success, Red irrelevance)

Three regions represent three distinct causes of death:

Death by Coordination Failure^[19] (above the Slope)
Death by Solidarity (depicted here with solidarity set at -0.25)
Death by Intolerance (depicted here with intolerance set to +0.25)

Cleaving our map in two is the Threshold. It is the defining region of every sawtooth problem we've looked at.

I argue that it is not a zero-width boundary. It is a region.

And we need to talk about it, because it's by far the most important aspect of nearly every sawtooth problem I've examined.

The Threshold

Let's go back to the very beginning, to one of the first sawtooth graphs we looked at for the Button Dilemma, with N=20.

Threshold_blue_frame_t50__s0.00_I0.00.svg

You may have noticed that although N=20, there are 21 total bars beginning with B=0. Consequently, B=10 sits smack in the center, and constitutes the Threshold for this problem.^[20]

This graph contains four of the six regions we've mapped:

The Slope
The Plateau
Death by Coordination Failure
The Threshold

Whether you are Blue or Red, you might regard B=10 as something that belongs to the Slope. I can't blame anyone who sees it this way. B=10 is just so obviously the lowest point of the Slope, right?

When you're only thinking about outcomes, it makes intuitive sense to place B=10 on the Slope. But this is game theory, which means we must consider strategies.

B=10 is the Threshold of this sawtooth problem.^[20]

The Threshold is the worst possible outcome, and nobody wants it. But Reds and Blues perceive the landscape from different standpoints and their strategies for avoiding the Threshold look quite different.

From the Red perspective:

The Blue button represents an incremental backslide.
Maximum utility is achievable, but unstable. There is no margin for error.
A surviving population cannot host even a single Blue.
Pressing Blue risks an unnecessary and visible death.
Moving from the Slope to the Threshold is a tragic process, but slow and linear.

From the Blue perspective:

The Red button represents a threat to collective survival.
Maximum utility is, by default, a stable target. There is potential margin for error.
A surviving population can host a cohort of Reds.
Pressing Red risks an unnecessary and invisible weakening of the margin for error.
Moving from Plateau to Threshold is a discontinuous plunge into catastrophe.

Notice that both sides are trying to keep away from the Threshold. Nobody wants to go near the thing, and we all feel as though we're being pulled into it from the other side. It can feel like the Threshold is coercing us into crab mentality, but that's not quite what's happening.

In the Button Dilemma, Red and Blue are mutual threats to each other. Pressing Red imposes a risk upon Blue that they feel is unnecessary. Pressing Blue doesn't impose a similar risk upon Reds (the Red take is that Blue is a risk unto itself), but coordination problems in the real world tend to be iterative, and a Blue victory represents a kind of pressure for Reds to press Blue in the future.

"You'll step into the blender next time, won't you? Stepping into the blender is the reason we all made it. We'll all survive next time too, and it will be easier with your help. The blender is safe. The blender is life."

This is not a convincing argument to someone with Red priors. It doesn't read like safety. It reads like a deep social pressure to do something that, not entirely unreasonably, feels like a risk to one's life. Reds can see that the Blues survived, and they might even expect it to happen again. But instead of acting to comfort them, this just makes them wonder: when will Blue coordination fail, and what will it cost me when it does?

In other words, this reads as a threat to Reds. "Risk your life for us all."

In spite of all of my Red POV explanations, I'm still a firm Blue on the Button Dilemma. I haven't even managed to convince myself to switch sides. I just feel like I understand what Reds are feeling.

We each see the other side as threats, and we all kind of have a point.

Reds are risking the lives of Blues, and Blues are pressuring Reds to risk their lives in turn.

These behaviors only draw us into the gravitational pull of the Threshold, which, I remind you, is the worst possible outcome that we are all trying to avoid.

This dynamic frightens me. It's like we need to succeed on some kind of meta-coordination problem, maybe even a meta-sawtooth problem. The worst part is that none of this even seems to be our fault: it's the Threshold, acting to draw us in.

I'll say one nice thing about the Threshold: it feels like a kind of anti-Nash equilibrium. In a multi-turn sawtooth problem, if you find yourself with a group of survivors inside the Threshold, every possible action you can take is an improvement, and the incentive to act is as strong as it can be.

Earlier, we looked at some extreme sawtooth problems, one of which had τ=0:

Notice anything interesting, as it pertains to the Threshold?

There is no Threshold. τ=0, duh.

The Threshold does not exist on this graph. In theory, it would sit one bar to the left of the leftmost bar.

When I look at this silly purple rectangle, my eyes tear up a bit. This rectangle is the problem that's not a problem. This is what a good future might look like with an aligned superintelligence.^[21]

If there is a meta-sawtooth problem, and we can resolve it safely, this purple rectangle might be where we find ourselves.

Let's Get Weird

By now, I hope you're familiar enough with sawtooth problems that we can spend a moment just looking at a handful of select cases.^[22]

Tragedy of the Commons & Regulation

This graph reflects a real-world dynamic.

By introducing a negative solidarity cost (i.e. pushing the Plateau down), and by allowing the Slope's peak to exceed the height of the Plateau, we increase the incentive for people to want to be in certain parts of the Slope, even though this graph's Slope terminates with an intolerance filter beyond which everyone dies.

Does this sound familiar?

The solidarity cost is the imposition of regulation. In the Plateau, everyone is guaranteed to fare a bit better than the ~10 worst possible outcomes within the Slope, but the best possible outcomes only exist within the Slope!

In terms of pure utility maximization, this shape of coordination problem incentivizes more Red votes—provided that the Reds believe themselves to be sufficiently few in number as to not exceed the intolerance filter.

Tragedy of the commons, accompanied by overly-burdensome regulation. Choosing Blue feels like losing out and Red provides an opportunity to seek higher utility. But if too many people choose Red, they go right over the cliff. Death by Intolerance.

In those outcomes, the sawtooth problem was intolerant of population dynamics with too much Red.

This is a graph of a real phenomenon.

We just traveled through sawtooth space, declared some parameters to exist, and subsequently found a graph that led to a specific dynamic which manifests in reality.
A pretty important one, at that.

This is exactly the sort of thing that makes me wonder whether I'm losing my mind.

In for a penny, in for a pound; let's continue.

The Decision Theory Befuddler

I'm genuinely at a loss for what one should even be thinking about in this sawtooth problem. What's the strategy here? What does FDT have to say about coordination problems that impose such harsh conditions of solidarity and intolerance?

This is a kind of sawtooth problem I'm not convinced can really exist, so maybe these questions are moot.^[23]

If Anyone Votes Red, Everyone Dies

If Anyone Votes Blue, Everyone Dies

See those thin bars of life at either side of each graph?

These graphs have a 1-to-many relationship with the parameters that can summon them.
In other words, both graphs can arise from multiple parameter configurations, all of which are isomorphic. But it's fun to think about the underlying mechanism that causes any given configuration to yield its associated graph.

Notice the subtle asymmetry in the configuration between these two sawtooth problems.
The former requires τ=99%, while the latter permits (but does not require) τ=0%.

Under conditions of maximally positive Solidarity, and given some theoretically-attainable, maximally-difficult goal, a 100% Blue victory is necessary.

Under conditions of maximally negative solidarity and a threshold of nil, a 100% Red victory is similarly necessary.

Negative solidarity imposes a cap on the total number of survivors of Blue coordination, and by dropping the threshold to 0%, we've turned all but the leftmost bar into outcomes which accept an arbitrary number of Blues. This is sawtooth-speak for "if you choose Blue, you will kill us all."

Both graphs actually arise from this same underlying mechanism, for the same reason, despite their parameters. The only difference is which button you need to coordinate around in order for everyone to survive.

My choice in either problem feels obvious, but I can't say these graphs don't make me nervous.

What happens if we tick the threshold up just one more notch on the "If Anyone Votes Red" graph?

No Matter What Anyone Does, Everyone Dies

This is the Worst Timeline. Here, a mixture of Red and Blue blood drowns all potential futures in pale, hatched lilac.

This is a rogue interstellar planetoid crashing into Earth six days from now. This is false vacuum decay. This is misaligned superintelligence.

No matter what any of us do, we all die.

As with the previous two graphs, there are multiple ways to bring about this sawtooth problem.

If There Are Fewer Than 16 Blues, Everyone Dies
(Except For One Weird Outcome Where Only 5 Reds Survive)

Okay, these are getting silly. Let's move on.

Weirder Still

Is there a valid kind of problem where the survival rate increases by default with the number of Blues, and then abruptly falls to nil above the Threshold?

Maybe:

This is something we haven't seen yet, because I just changed a fundamental rule.

This graph does not depict anything in the same part of sawtooth-space we've been exploring. I'm not really sure it even is a sawtooth problem anymore, at least in the way I've been describing them. It's weird:

It has no Blue survivors, even in outcomes where they are the majority.
50% is the largest possible population of Reds, despite nil solidarity and intolerance.
Below B=50, the number of survivors is constrained by a deficiency of dead Blues.
Above B=50, the number of survivors is suppressed by an excess of dead Blues.

This is a world where only Reds survive, and their survival is entirely dependent upon the personal sacrifice of willing Blues.

I call this the sacrificial rule. Sawtooth problems which are subject to the sacrificial rule show us the shape of unavoidable sacrifice.

Maybe this is the rule governing coordination problems faced by soldiers in a trench, moments after receiving an enemy grenade. Will one soldier throw himself on the grenade? Or will all perish?

I suspect that this kind of problem is governed by parameters we haven't looked at yet. For instance, the soldier example sacrifices 1 Blue to save Reds. We would need a parameter that can smoothly allow for this.^[24]

I haven't thought nearly enough about sacrificial sawtooths, but I nevertheless wanted to include this problem as a small nudge: if you find what I've written here interesting, there's certainly much more ground to cover. The three parameters I've talked about are only the tip of the iceberg.

But for now, let's stop. We've looked at plenty of graphs. I think it's time to reflect.^[25]

Final Thoughts

My early drafts of this piece largely comprised a handful of loosely connected, meandering philosophical arguments about various reframings of the Button Dilemma.

I no longer care very much about those reframings. They almost feel like distractions. Boring, noxious emissions of the Threshold.

Yeah, yeah, of course I won't get in the blender. Of course I'll press Blue even when it's the only button. Of course I'll press Red if the Threshold is at 99% and the Slope has low solidarity cost. And of course the other side is suicidal/dangerous/stupid/selfish.

That's all noise. The signal is in the sawtooth.^[26]

I've not found any prior works describing sawtooth problems as a specific, well-defined class of coordination problem. I've looked, and have found some partially-overlapping areas of study, but nothing that seems to clearly map to anything like this.^[27]

So, either I've found something genuinely novel, or I've missed some obvious research on essentially this topic, or I've succumbed to AI psychosis. A lottery which manages to be exciting and embarrassing and terrifying.^[28]

I still see myself as fundamentally a Blue—at least in the specific situational framing of the Button Dilemma. But I no longer look across the Threshold at my Red brothers and sisters with suspicion and fear. I fear the Threshold itself. It's among the most frightening concepts I've ever understood. It's a geographic feature of coordination space which puts light-years between us and actively convinces us that the people on the other side are a threat. By convincing us, it becomes true: we are threats to each other. So long as the Threshold lies between us, we may always be threats to each other.

It's like a dark forest that's been here with us the whole time, and we've somehow managed to never see it.

So what will we do? Annihilate each other until one side manages to eliminate the threat? Coordinate? Just press Blue?—no, Red!—no, Blue! Really for real, everyone! All together at once! No, wait, what are you—

Most of us are sincerely trying to coordinate. It's become so clear to me that we really are trying. Looking at sawtooth problems has made me feel hope, and it's made me begin to question just how prevalent Prisoner's Dilemma-style defection really is.

I'm not so arrogant as to think I can disregard defection as being a thing that happens. It's just that I'm starting to look around at the groups I perceive as defecting, and now I wonder whether the real issue is that we're all trapped on opposite sides of the Threshold in an especially nasty collection of sawtooth problems.

If we can finally see the forest, maybe we can blaze a path. The territory we're coordinating upon is borderline inhospitable, full of perilous cliffs, tragedy, and sacrifice. We need to work together if there's any hope of making it through without destroying ourselves.

So if this is a real thing, and if I haven't lost my mind, then I invite you: please spend some time exploring sawtooth space with me.

Maybe we really can all be okay.

^{^}
In the form of many graphs.
^{^}
I am not 100% convinced that this entire thing isn't just the result of some extremely sneaky AI psychosis. I spend a lot of time interacting with LLMs. Hours a day.
I do it for work. The scope of my primary job essentially involves building custom harnesses for LLMs and teaching others how to improve their harness architectures.
I'm also a participant in Anthropic's model jailbreak red team program.
The reason I'm saying all this is: I feel like I'm making some rather grandiose statements in my writing here, despite being an outsider to the entire topic. I don't have much background experience in thinking about math/philosophy/game theory in this way, which has me very suspicious about whether this is all just a load of nonsense.

I've also legitimately been losing sleep over this thing, and I've spent nearly every waking moment for the last couple of weeks just endlessly thinking about it.
My job and my hobbies put me in a lot of extended contact with LLMs. This plus the specific way I interact with them puts me under frequent, high-dose exposure. If I lower my guard, or have a gap in my cognitive security, then I'm at significant risk of developing some form of AI psychosis. I'm also thankfully very cognizant and watchful for it.
So, if the response I receive is "you're off your rocker," I commit to trusting that. I get the impression that this community would give me an honest, accurate assessment.
But I don't think I'm off my rocker, at least not for AI-related reasons. For one thing: I've made a very intentional point of prompting LLMs in such a way as to elicit as few remarks as possible on the specific game theory ideas I'm discussing here. The overwhelming fraction of my prompting has gone into building the graphics for this post, and building tools to help me build more graphics.
Exceptions to this have included:
- Checking my understanding of existing research in game theory
- Researching potentially-involved coordination problems
- Making sure I didn't botch any of the honestly-pretty-straightforward math
- Running a big sweep for typos/grammar/etc. on the final draft, resulting in a handful of minor adjustments to some phrases here and there
- I anticipate asking Claude for a direct opinion on the essay itself the moment I publish this.
  After all, I cannot claim I am not curious.
Claude Opus 4.6 gets credit for naming the "Solidarity" and "Intolerance" parameters I go on to describe. Aside from that, no LLM wrote any of this piece—not even the em dashes.
There is another alternative, of course: everything I've written about could be completely unoriginal and very well-studied. But I've made a very sincere attempt to find significant overlap between my ideas and established literature on coordination problems. And I've come up nearly empty.
So either I've found something novel and interesting, or I'm ignorant/blind, or I've got some form of psychosis. What fun!
^{^}
I had to make a decision between treating this as a function of the number of people who select either Blue or Red.
As far as the math is concerned, both Red and Blue are valid frames of reference. I finally landed on Blue because the Red reference frame produces a graph which implicitly communicates something like "everyone survives unless red voters exceed the threshold." I perceive this to be a very unfair characterization of Red priors, and I suspect most Reds would disagree with it.
For anyone interested, here is what the default Red frame looks like:
- Given participants and threshold , each participant selects or .
- Let be the fraction selecting .
- If , everyone survives.
- If , exactly those who selected survive.
Graphed:
🇺🇸
^{^}
Arguably, we also see the Reds who survive a Red coordination failure. Remember, when , everyone lives because everyone votes Red. If there is a successful form of coordination around Red, then there must also be a failure mode.
More on this in a later footnote.
In short, my thinking is that sawtooth problems do not actually have Blue or Red coordination failure.
^{^}
The common threshold for this dilemma is 50%. In Tim Urban's poll, 57.9% selected Blue. Outcome: 100% of participants survived.
Asked to consider a threshold of 80%, the Blue cohort managed to coordinate 66.5%. Outcome: 33.5% of participants survived.
^{^}
Though I can sense some Reds prickling over how both fully lean into the Blue frame.
^{^}
And like a binary coordination game.
^{^}
TBH I expect a majority of people reading this to answer "yes" and, on average, to be more correct than not. I also expect at least some of those cases to involve two Reds.
^{^}
Because I think it distracts from examining the general space we're exploring, I'm intentionally witholding my personal answer for most of these sawtooth problems.
I do get the sense that broad and well-understood adoption of Functional Decision Theory offers stable Blue coordination even under this extreme condition.
^{^}
Within the set of problems that permit both Blue success and failure, that is.
^{^}
It feels like a stretch to call either of these "coordination problems" but I'm not sure how else to describe them.
^{^}
I'm just referring to the general shapes, to be clear.

The original dilemma does not have any outcome with 0% survival.
^{^}
Putting the macabre aspects of its subject matter aside, I think this graph is aesthetically pleasing. It makes me feel something that I can't describe, and I really enjoy just looking at it. I would hang this on the wall above my sofa.
^{^}
Days after publishing, I love this graph even more than I already did. It's taught me something about my incorrect intuitions.
This is not the tipping point for when a purely-selfishly-motivated Red should switch to Blue. Intuitively, it feels like this should be the tipping point, because voting Red below the Threshold always grants you a 50% chance of survival in this graph, and the Threshold is at 50%. But that's a nonsense explanation that assumes something wrong about how this all works.
If you vote Red, you've got as good odds of ending up in your 50%-survival zone as you do in the 100% survival zone. The average survival rate of choosing Red is therefore 75%.
The average survival rate of choosing Blue is the average of 0% and 100%: 50%.
Voting Red remains the best strategy for truly selfish people down until 0% survival below the Threshold.
This only holds for otherwise-symmetric sawtooth problems with a discontinuous Threshold at 50%.
Lesson: I need to actually understand the math, and I probably still don't.
^{^}
sawteeth?
^{^}
I've got a bit of doubt about the convention I'm trying to establish here, but I like to think of positive values as scaling things in favor of incentivizing outcomes above the threshold, and vice versa for negative values.
^{^}
Here is a link to the tool I used for making these graphs, built by Claude: https://claude.ai/public/artifacts/f4a94806-a2b8-49b5-a2d6-d403ef21c451
^{^}
Not every sawtooth graph contains all of these regions, and not all regions will manifest with the same shape, or on a consistent side of the Threshold.
With nil solidarity and intolerance, the associated death regions vanish.
^{^}
Fun fact: across many drafts of this post, long after I felt I had overcome my biases about "who is failing to coordinate," I was still calling this "Death by Blue Coordination Failure."
Please let the important lesson of my thesis be learned again by both myself and yourself. Sawtooth problems do not have Blue or Red coordination failure. They certainly do have coordination failure, but it is not owed to any one side.
(Yes, I'm aware that I literally say "Blue coordination failure" in my description of the Slope region. As far as I can tell right now, this is an accurate description, but I could be mistaken. My take is that Reds don't typically see themselves as having failed when Blues land in this death region.)
^{^}
For odd values of N, the Threshold sits just to the left of center in symmetric sawtooth problems.
^{^}
It may be obvious at this point, but in putting myself through this exercise, I no longer see e/acc as defection.
I used to see it as the worst defection of all, and I certainly still see it as a dangerous threat to myself, themselves, and everything anyone cares about. But in terms of the people, I just see them on the other side of the Threshold.
I wonder if they might be looking up at me and my friends on the Plateau, thinking we don't want them up here with us and feeling very confused about that.
^{^}
An important reminder: each of the graphs we've seen describes a different shape of sawtooth problem, and our parameters describe that shape. They do not tell us anything about the underlying population dynamics.
Each graph simply displays all possible population ratios (i.e. all possible outcomes), sorted as a function of each ratio's number of constituent Blues.
Non-zero solidarity does not tell us one side is more united than the other.
It tells us the problem imposes an unavoidable cost upon one side's survivors.
Non-zero intolerance does not tell us one side is less tolerant than the other.
It tells us the problem is intolerant of populations that exceed some certain ratio.
If it helps, you may prefer to think of these as the solidarity cost and intolerance filter.
What can we say about the underlying population dynamics?

Well, Tim Urban's poll resulted in ~58% of people selecting Blue, which looks like this:
A single bar represents one population dynamic. In this graph for Tim Urban's poll, the 58% Blue outcome is highlighted—this was the final, measured dynamic of the game posed by his poll.
^{^}
If this sawtooth problem can exist, and you find yourself inside it, you're essentially trying to regress everyone to the mean. i.e. voting against whichever side you expect to be the majority.
^{^}
I definitely need to think about this more, because I'm staring at this and feel like I'm misunderstanding something.
^{^}
There are many more graphs I'd love to share, and sawtooth concepts I'm eager to explore, but they'll have to wait for now.
Here's what I plan to work on next:
- Pinning down the rules that distinguish sawtooth problems from non-sawtooth problems
- Identifying the baseline parameters for sawtooth problems (I suspect the Button Dilemma may be a kind of "most simple" sawtooth problem, and there may be two or three parameters underlying it. τ is among them; σ and I are not.)
- Carrying out a much deeper review of the parameters I've defined (at minimum, I suspect that Solidarity might be oversimplified or missing something; it has some quirks I'd like to address in a future post)
- Determining which kinds of parameters lead to valid, useful models for sawtooth problems that manifest in reality
- Testing some ideas I have for implementing this framework as a tool that can be used to solve sawtooth problems
In terms of the sheer complexity of what we can accomplish when we work together, humans are the best coordinators on Earth. This has yielded undeniable benefits for our species, and it's also produced a never-ending onslaught of increasingly difficult coordination problems.
How do we fairly distribute our limited resources? Which tax structures provide the best balance of resource allocation and individual liberty? How do we stop hurting each other in pursuit of survival?
In my opinion, these kinds of problems have been essentially solved—in the sense that some people have proposed rather clever ideas for solutions. But we are failing to coordinate toward implementing those solutions, which means they haven't really been solved. We keep fighting about them, consequently getting pulled toward the Threshold.
As far as this framework is concerned, I've found two promising ways that we humans may be able to navigate sawtooth space:
- Update our priors and change the button we choose to press, thus influencing the final outcome of any given sawtooth problem
- Modify the shape of the sawtooth problems themselves
I'm especially excited about the second one.
Both of these navigation strategies would introduce a third axis to our sawtooth graphs: time.
^{^}
Toward the beginning of all this, I said "I think it might be the case that when you reframe the Button Dilemma, you fundamentally change the question being asked."
This line was a holdover from an early draft of this piece. I've come to realize that it kind of contradicts some core elements of my thesis, but it also does a really nice job of cuing up the blender analogy. I couldn't settle on an alternative, so the contradiction remains purely as a pedagogical tool.
Situational reframings of the Button Dilemma don't fundamentally change the question being asked, if by "question being asked" you mean "sawtooth problem."
But they clearly do something.
At minimum, I think situational reframings expose our priors about how we expect others to behave, and/or what we expect the marginal utility of our contribution to actually be.
If you have a Blue prior, this might mean you expect there to be enough other Blues for your decision to be acceptably low risk. Or it might mean you expect the outcome to fall somewhere near the Threshold, but maybe it's close enough for your choice to make a difference.
If you have a Red prior, this might mean you expect the Blues to fail, or that the outcome will be far enough from the Threshold for your choice not to matter.
Priors are obviously not parameters you can adjust to change the shape of a sawtooth problem. They're part of how you arrive at your decision about which button to press, which certainly influences the outcome. This might be an important aspect of how we draw up an operational framework for safely navigating sawtooth space.
^{^}
As far as I can tell, the earliest recorded formulation of the Button Dilemma was posed on Reddit by user Deadshot37 on April 20, 2023. I reached out to him and was told that he did not come up with the idea. He could not recall its provenance.

How about the larger class of problem we're looking at?
In the simplest case of N=2, we've seen that sawtooth problems might bear some relation to pure coordination—at least in cases where the sawtooth is symmetric.
Sawtooth problems may also have something to do with step level public goods distribution, and threshold public goods games.

As of this writing, I haven't gotten through all of these papers yet—these are just the ones I haven't been able to trivially eliminate from my search.
The very first one describes how the moral frame of a threshold public goods game may have bearing on whether people press Red or Blue, but this seems outside of the scope of exploring sawtooth problems in their pure form.
- Schuch, Esther, Tum Nhim, and Andries Richter. 2025:
  Coordinating on good and bad outcomes in threshold games – Evidence from an artefactual field experiment in Cambodia
- Zheng, Guozhong, Weiran Cai, Guanxiao Qi, Jiqiang Zhang, and Li Chen. 2025:
  Optimal coordination in Minority Game: A solution from reinforcement learning
- An, Xinmiao, Yali Dong, Xiaomin Wang, and Boyu Zhang. 2023:
  Cooperation and Coordination in Threshold Public Goods Games with Asymmetric Players
- Frey, Seth, and Robert L. Goldstone. 2013:
  Cyclic Game Dynamics Driven by Iterated Reasoning
I've also been poking through the following foundational papers in public goods problems. Unfortunately, they are paywalled:
- Hans-Theo Normann, Holger A. Rau. 2011:
  Step-Level Public Goods: Experimental Evidence
- Charles Bram Cadsby, Elizabeth Maynes. 1999:
  Voluntary provision of threshold public goods with continuous contributions: experimental evidence
- Joep Sonnemans, Arthur Schram, Theo Offerman. 1998:
  Public good provision and public bad prevention: The effect of framing
- Alphons J. C. van de Kragt, John M. Orbell, Robyn M. Dawes. 1983:
  The Minimal Contributing Set as a Solution to Public Goods Problems
^{^}
This being said, here is what I currently expect about sawtooth problems:
- I barely understand them and have either bungled some of the math or drawn conclusions from inadequately-justified assumptions (99% confident, but I think I nevertheless managed to wrangle this topic into a worthwhile and valuable read)
- I've found a new framework for thinking about coordination problems. (85% confident, but this is a fairly vacuous claim)
- I've found a new and genuinely meaningful way to group/classify coordination problems (80% confident, slightly less vacuous)
- I've found a new cluster of coordination problems which contains a lot of well-understood stuff, but also potentially something quite novel (Maybe 60% confident? This is the kind of claim that makes me doubt my sanity a bit)
- I've found something which might point toward an instrumentally useful tool for outright solving sawtooth problems and defeating the Threshold (15%?? I'm inclined to be extremely skeptical of this idea, which makes 15% feel like psychosis. But if I hadn't felt this small, scary bit of hope, I don't know that I would have written any of this.)

[-]Bucky5d91

If we are talking decision theory may I suggest the following way of looking at it.

Let’s say 101 participants with 50% threshold. Your individual decision has the following effect depending on how the other 100 vote.

If 0-49 others vote red then your vote has no effect as majority will be blue however you vote.

If 51-100 others vote red then you voting blue kills 1 extra person (you!) compared to voting red.

If exactly 50 others vote red then you voting red kills 50 extra people (all the blue voters).

So, there are 50 scenarios where you can save 1 life by voting red and 1 scenario where you can save 50 lives by voting blue.

You decision therefore you be based on:

a) whether you value you own life more than others in the game (technically how much you value your life compared to the average blue voter!)

b) how you expect others to vote. If you expect more blue voters it should move you towards voting blue as the 50:50 scenario is more likely than each of the individual 51-100 red scenarios. Likewise, if you expect more reds it should shift you towards red.

So now it looks very like the 2 player version - a coordination game with the possibility of guaranteeing personal survival.

[-]Alexander Slugworth5d10

Thank you! I spent hours just staring at what the Threshold looks like in various sawtooth problems and still managed to miss the frequency-magnitude tradeoff.

If you expect the average choice to land right at the Threshold, it does begin to feel similar to the 2-player version for arbitrary values of N. The only difference is that 100% successful Red coordination ceases to be a target you can aim for.

Really interesting stuff happening around the Threshold.

[-]Dagon5d5-4

I worry about putting too much weight on people's public statements, for a thought experiment that all participants know is purely about signaling. nobody died or will die from these fake buttons. It's just a chance to bloviate about cooperation vs defection in a politically-adjacent format, with extremely predictable results ("blue" political pursuation focuses on "good people take risks to save others" and "red" leaners focus on "if everyone made good decisions for themselves, they wouldn't need so much saving").

If it were real, most humans would push the button that their preferred elites and media influencers tell them to. Both can be justified, as you point out, with clever framing.

[-]Richard_Kennaway3d3-1

Apologies in advance for not addressing your analysis of the whole family of variations, but only speaking about the original.

Staunch Red here, and have been since first learning of the puzzle.

Framing is irrelevant. Isomorphic puzzles have isomorphic answers. If someone's answer is changed by the framing, they are not thinking properly. If they notice that they are swayed by the framing, that is their chance to become stronger. As you are doing by analysing all those variations, but I can't face going through them all, because formalising the problem requires some sort of reflexive decision theory, which is an unsolved problem. There is no way to define a probability distribution of button-presses, because people's decisions can depend on the distribution they anticipate.

It never was.

But since people do appear to be influenced by the framing, here's another. But first, I want to go to a place where you change the problem:

Children do not participate.

No. Let's not exclude the part where children are choosing as well. It's a crucial part of the problem for some others, who justify blue by saying, "but think of the children (and the feeble-minded, etc.), the poor children who will just press blue by chance, think of all the children, you must want them to die if you don't press blue, what sort of monster are you, die scum etc. cont. p.94".

How are the children going to make that choice? Not in a vacuum. Their parents will advise them, or tell them which button to press, or press it for them. That is one of the duties of parents, guiding their children through the hazards of early life.

You want everyone to press blue and in this article you are trying to persuade them to (despite refraining-not-refraining from giving your own attitudes, in footnote 9). Therefore that is what you will be telling your own children or making them do. You will be risking their lives in order to get the virtuous thrill of saving their lives.

Would you urge your children to run into the traffic, so as to pull them back at the last moment? Push them out of a high window, to catch them just as they begin to plummet?

The virtuous thrill is trash. Look at the real outcomes, not the feels. Before: everyone is alive. After: everyone is alive. Nothing positive is achieved in the end, any more than from a chain letter. A chain letter of virtue signalling.

As Insanity Wolf might put it:

HAS CURE FOR DISEASE
SPREADS DISEASE TO CURE

WANTS TO FIGHT OPPRESSION
CREATES OPPRESSION TO FIGHT

WANTS TO SAVE THE CHILDREN
JAMS A BLENDER WITH THEM

[-]Alexander Slugworth3d10

If my post acts as an attempt to persuade others to join the Blue team, or if it carries a subtext of that flavor, then I need to improve my communication skills.

While writing this post, I would regularly notice little phrases, graph choices, etc. which communicated an unnecessary Blue bias. To the best of my ability I've tried to minimize all of that. Alas, eliminating one's bias from one's communication is a continuous process. There will always be more.

I've raised my Blue heart to the sky, that any bias I've overlooked might be made obvious to others. If you've found it, then I'll at least consider that last part a success.

Still, I'd like to emphasize: I do not want to persuade you to change sides. Adjusting the outcome of a sawtooth problem might be but one path toward safe resolution. If there are other paths, I expect them to be much easier to walk - provided we can find them.

I meant it when I described Red as a kind of fallback strategy for the overall population. For most of my life, I've seen a world drowning Blue shades of coordination failure. By looking at it through the framework of sawtooth space, I feel like I now see vast swaths of Red picking up the pieces of Blue hubris.

I think safely resolving a dangerous sawtooth problem essentially involves smoothing it out, such that the Threshold disappears and our choice of color becomes irrelevant.

Would I like it if you voted Blue? Of course I would. I'm a Blue. But I really don't want you to change your vote. It feels like it would be dangerous for me to want you to change your vote. If I succeed in persuading you, and Blues nevertheless fail, then all I've managed to do is drag everyone a little further into the Threshold. Stay Red. The world might need you.

[-]Measure4d30

I'm genuinely at a loss for what one should even be thinking about in this sawtooth problem. What's the strategy here?

Flip a coin.

[-]Alexander Slugworth4d10

I feel like this might be a valid strategy if you've reasoned that the total population breakdown will land right at 50/50.

If you have any expectation to the contrary, then your best strategy is to select the color that you think will receive the fewest votes from your peers.

Edit: Ah, maybe you're saying that the best FDT strategy is to just flip a coin? (assuming you expect your peers to all have similar decision making functions to your own)

[-]oligo4d20

The original version (and many of these, potentially) being so framing-dependent in one's answers is an interesting case of responding to framing being rational: you know most people would be irrationally more unwilling to step into a physical blender than press a button that does the same thing, so in that framing there's a very likely strong red majority hence reason stronger reason to choose red. In this sense "Here's the problem, btw everybody the Schelling point is [red/blue]" would be the most "honest" framing effect.

This may or may not point to broader principles of how framing effects work; I'll have to think on it!

With respect to the Decision Theory Befuddler:

I think the correct game-theoretic answer would be to flip a coin.
This might be analogous to cases of the division of labor, though that's a case where people are able to explicitly coordinate.

I regret naming that graph The Decision Theory Befuddler, because in retrospect I think it's likely that a population of perfectly rational FDT agents would all independently settle on the coin toss strategy.

In a population of irrational agents, I'd assume that you're just trying to vote against whichever side you expect to be the majority.

[-]Celarix5d20

That tweet is the second worst thing Tim Urban ever did to me, lol.

I don't think the blender scenario is quite equivalent unless it's specified that the blender is off until everyone's choice is revealed. And people do tend to be reticent about changing parameters too much.

I do love the charts and parameterization! That "regions" graph is quite good as well, articles like this say something useful about the question rather than just bicker about it ceaselessly.

[-]Bucky5d81

An alternative to the blender framing:

Everyone must press a red or blue button. If you press red you will survive. If you press blue you will survive if, and only if, more than 50% of people press blue. Everyone receives the same instructions.

I suspect that with that framing the red vote would increase significantly.

[-]Static1d*10

Logic aside, I think humans have a natural, evolutionary tendency to choose Blue. Humans live and survive as a group, and losing any group member indirectly threatens the survival of the group. While it is theoretically possible for everyone to survive by collectively voting Red, perfect coordination on that scale is extremely unlikely and costly to sustain. Voting Red almost guarantees that someone will die.

This scenario reminds me of cancer in multicellular organisms.

These behaviors only draw us into the gravitational pull of the Threshold, which, I remind you, is the worst possible outcome that we are all trying to avoid.
This dynamic frightens me. It's like we need to succeed on some kind of meta-coordination problem, maybe even a meta-sawtooth problem. The worst part is that none of this even seems to be our fault: it's the Threshold, acting to draw us in.

I agree with you that when repetition is allowed, the system's tendency to drift from a safe Blue majority toward the threshold is unsettling. It implies that a system can function smoothly for a long time as some sort of "entropy" accumulates silently, right up until it hits a tipping point and things fall apart.

Also, love the visualizations. They are very clear and self-explanatory.

[-]casens4d10

the button problem suffers from not having salient enough analogies to real world problems, and so does your article; before i read it i already reached the conclusions "this button situation has no upsides for anyone either way" and "rewording the problem and the parameters changes people's behaviors in ways that make them seem irractional/unprincipled". i don't really agree that AI racing is a threshold problem: perhaps blue-coordination/safe-ai saves everyone, but red-pushing/AI-racing kills everyone, including the red-pushers. red-pushers/AI-racers might benefit from the AI in the short term, but blue-pushers/safe-ai benefit from the AI after the pause is over.

compare, though, relating newcomb's problem to nuclear mutually assured destruction: if you're in charge of the US arsenal and you're 100% sure that soviet bombs are dropping, do you launch your own nukes against them? similar arguments apply: "the money is already there/the bombs have already dropped, so my choices can't change that" vs. "if you can credibly pre-commit to actions, that changes how people interact with you." some people arguing newcomb's problem probably get lost in the weeds and abstractions that gets fixed with the nuclear analogy

54