New Answer

New Comment

2 Answers sorted by
top scoring

Oct 07, 2025

3513

The “multiple stage fallacy fallacy” is the fallacious idea that equations like

are false, when in fact they are true. :-P

I think Nate here & Eliezer here are pointing to something real, but the problem is not multiple stages per se but rather (1) “treating stages as required when in fact they’re optional” and/or (2) “failing to properly condition on the conditions and as a result giving underconfident numbers”. For example, if A & B & C have all already come true in some possible universe, then that’s a universe where maybe you have learned something important and updated your beliefs, and you need to imagine yourself in that universe before you try to evaluate $P (D | A & B & C)$

Of course, that paragraph is just parroting what Eliezer & Nate wrote, if you read what they wrote. But I think other people on LW have too often skipped over the text and just latched onto the name “multiple stages fallacy” instead of drilling down to the actual mistake.

In the case at hand, I don’t have much opinion in the absence of more details about the AI training approach etc., but here’s a couple general comments.

If an AI development team notices Problem A and fixes it, and then notices Problem B and fixes it, and then notices Problem C and fixes it, we should expect that it’s less likely, not more likely, that this same team will preempt Problem D before Problem D actually occurs.

Conversely, if the team has a track record of preempting every problem before it arises (when the problems are low-stakes), then we can have incrementally more hope that they will also preempt high-stakes problems.

Likewise, if there simply are no low-stakes problems to preempt or respond to, because it’s a kind of system that just automatically by its nature has no problems in the first place, then we can feel generically incrementally better about there not being high-stakes problems.

Those comments are all generic, and readers are now free to argue with each other about how they apply to present and future AI. :)

[-]plex2mo42

It's kinda covered by 1 and 2 of you apply it right, but one view on how this plays it that I've found helpful is: Having model uncertainty on many individual steps predictably making the output look low confidence. If you can break something into 10 steps that you multiply together, and feel uncomfortable assigning more than 0.8 to any individual guess, you're always going to have a low final answer.

[+]David Johnston1mo-50

tailcalled

Oct 07, 2025

The neural tangent kernel^[1] provides an intuitive story for how neural networks generalize: a gradient update on a datapoint will shift similar (as measured by the hidden activations of the NN) datapoints in a similar way.

The vast majority of LLM capabilities still arise from mimicking human choices in particular circumstances. This gives you a substantial amount of alignment "for free" (since you don't have to worry that the LLMs will grab excess power when humans don't), but it also limits you to ~human-level capabilities.

"Gradualism" can mean that fundamentally novel methods only make incremental progress on outcomes, but in most people's imagination I think it rather means that people will keep the human-mimicking capabilities generator as the source of progress, mainly focusing on scaling it up instead of on deriving capabilities by other means.

^{^}
Maybe I should be cautious about invoking this without linking to a comprehensible explanation of what it means, since most resources on it are kind of involved...

5 comments, sorted by

top scoring

Click to highlight new comments since: Today at 1:09 PM

[-]Raemon2mo1011

In analogy, someone who wanted to argue that it's infeasible to build text-to-image generative models that make art that humans enjoy, could partition their prediction of failure into disjunctive failure modes: the model has to generalize what hands look like; it has to generalize what birds look like. It has to generalize which compositions and color combinations are pleasing—which is arguably an "ought"/steering problem, not an "is"/prediction problem! One-shotting all of those separate problems isn't something that human beings can do in real life, the argument would go. But of course, the problems aren't independent, and text-to-image generators do exist.

Isn't part of the deal here that we didn't one-shot image generation, though?

The first image generators were crazy, we slowly iterated on them, and image generation is "easy" because unlike superintelligence or even self-driving cars or regular ol' production code, nothing particularly bad happens if a given image is bad.

[-]Raemon2mo20

That said, FYI I was kind of enlightened by this phrasing:

That is, in the multiple stage fallacy, someone who wishes to portray a proposition as unlikely can prey on people's reluctance to assign extreme probabilities by spuriously representing the proposition as a conjunction of sub-propositions that all need to be true.

I'd been feeling sus about why the multiple stage fallacy was even a fallacy at all, apart from "somehow in practice people fuck it up." Multiplying probabilities together is... like, how else are you supposed to do any kind of sophisticated reasoning?

But, "because people are scared of (or bad at) assigning extreme probabilities" feels like it explains it to me.

[-]kave1mo53

I think the larger effect is treating the probabilities as independent when they're not.

Suppose I have a jar of jelly beans, which are either all red, all green or all blue. You want to know what the probability of drawing 100 blue jelly beans is. Is it ? No, of course not. That's what you get if you multiply 1/3 by itself 100 times. But you should condition on your results as you go. P(jelly1 = blue)⋅P(jelly2=blue|jelly1=blue)⋅P(jelly3=blue|jelly1=blue,jelly2=blue) ...

Every factor but the first is 1, so the probability is $\frac{1}{3}$ .

[-]Buck1mo42

I liked Joe Carlsmith's discussion of this here.

[-]Vladimir_Nesov2mo40

There are two different issues with "the first critical try" (the After regime), where misalignment is lethal. First, maybe alignment is sufficiently solved, and so when you enter After, that's why it doesn't kill you. But second, maybe After never arrives.

Gradualist arguments press both issues, not just alignment of After. Sufficient control makes increasingly capable AIs non-lethal if misaligned, which means that an AI that would bring about the After regime today wouldn't do so in the future where better countermeasures (that are not about alignment) are in place. Which is to say this particular AI won't enter the After regime yet, because the world is sufficiently different and this AI's capabilities are now insufficient for lethality, an even more capable AI would be necessary for that.

This is different from an ASI Pause delaying the After regime until ASI-grade alignment is solved, because the level of capabilities that counts as After keeps changing. Instead of delaying ASI at a fixed level of capabilities until alignment is solved, After is being pushed into the future by increasing levels of control that make increasingly capable AIs non-critical. As a result, After never happens at all, instead of only happening once alignment at a relevant level is sufficiently solved.

(Of course the feasibility of ASI-grade control is as flimsy as the feasibility of ASI-grade alignment, when working on a capabilities schedule without an AGI/ASI Pause, not to mention gradual disempowerment in the gradualist regime without an AGI Pause. But the argument is substantially different, and a proponent of gradualist development of ASI-grade control might feel that there is no fixed After, and maybe that After never actually arrives even as capabilities keep increasing. The arguments against feasibility of gradualist development of ASI-grade alignment on the other hand feel like they are positing a fixed After whose arrival remains inevitable at some point, which doesn't acknowledge the framing from gradualist arguments about development of ASI-grade control.)

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

41

[ Question ]

Generalization and the Multiple Stage Fallacy?

41

41

2 Answers sorted by
top scoring

Oct 07, 2025

Oct 07, 2025

41

[ Question ]

Generalization and the Multiple Stage Fallacy?

41

41

2 Answers sorted by top scoring

Oct 07, 2025

Oct 07, 2025

2 Answers sorted by
top scoring