Huh. This post was thoroughly not what I expected from the title.
I was expecting something about bridges connecting and walls dividing; maybe lumping and splitting, or something about political factions and alliance-making vs. purity-testing.
(To be clear, this post is also a good idea and upvoted. I just wanted to point at the surprise thing.)
I am not sure that they are similar. As I remarked in comments to your post, econ-related thinking could be closer to thinking only about easy-to-model aspects instead of the entire system. If an American corporation outsourced factory work to Asia, it doesn't destroy the entire USA, it just is a step in the race to the bottom and towards American workers losing qualifications.
Additionally, I don't think that wall-thinking and bridge-thinking are as easy to separate as the OP describes. Suppose that mankind can choose between not creating the ASI at all with probability
I think the biggest difference between wall-thinking and bridge-thinking here isn't actually about the size of
If I were to sum it up - bridge thinking assumes you need a lot of effort to start to meaningfully reduce
As an example, let's take the following hypothetical belief: "If we had really good interpretability tools, then there would be a lot of low-hanging fruit we could pick with those tools. But without those tools we're operating blindly, and can't make much progress at all". By this belief, we are currently in a bridge model - small improvements to current interpretability techniques will yield almost nothing. But if we did develop those good tools, we would now transition to wall thinking - there's lots of marginal effort that leads to a reduction in
Here is a Claude-generated visualisation of what that would look like, demonstrating what the curves look like in each regime in my mind. This works whether
There are a couple of frames I find useful when understanding why different people talk very differently about AI safety - the wall, and the bridge.
A wall is incrementally useful. Every additional brick you add is good, and the more bricks you add the better. If you are adding a brick to the wall you are doing something good, regardless of the current state of the wall.
A bridge requires a certain amount of investment. There's not much use for half a bridge. Once the bridge crosses the lake, it can be improved - but until you get a working bridge, you have nothing.
A solid example of wall thinking is the image in this thread by Chris Olah. Any approach around “eating marginal probability” involves a wall frame. Another example is the theory of change of the standards work I've done for Inspect Evals, which I would summarise as “Other fields like aviation and rocketry have solid safety standards and paradigms. We need to build this for evaluations - it’s the kind of thing that a mature AI safety field needs to have.” This theory doesn’t have a full story of how it helps save the world end-to-end, but it doesn’t have to, under the wall frame - it just has to be pointing in the right direction and be broadly helpful.
A good example of bridge thinking is the MIRI approach - asking openly and outright for an international treaty. In my understanding, MIRI are not asking for the most they think they can get away with. They are asking for what they think is necessary to solve the problem, and believe anything less is insufficient. In lieu of p(doom), Eliezer asks “What is the minimum necessary and sufficient policy that you think would prevent extinction?” This is bridge thinking - we need to achieve a certain outcome X, and <X won’t do. Anything that has no chance of achieving X or greater is unhelpful at best and counterproductive at worst, And to figure out what X is needed, you need to have a solid idea of what your high level goal is from the start and how a given course of action gets you there.
From the wall perspective, bridge thinkers are ignoring or denigrating important marginal or unglamorous work in favor of swinging for the fences. From the bridge perspective, wall thinkers are doing things that are not helpful and will end up rounding down to zero.
I have found this a very useful way to understand why some people in AI safety are proposing very different ideas than me.