109 From Vitalik: Galaxy brain resistance

10th Nov 2025

1 min read

109

This is a linkpost for https://vitalik.eth.limo/general/2025/11/07/galaxybrain.html

I basically fully endorse the full article. I like the concluding bit too.

This brings me to my own contribution to the already-full genre of recommendations for people who want to contribute to AI safety:
Don't work for a company that's making frontier fully-autonomous AI capabilities progress even faster
Don't live in the San Francisco Bay Area.

Cheers,
Gabe

Epistemic HygieneInside/Outside ViewMotivated ReasoningRationalizationRationality

Frontpage

109

New Comment

2 comments, sorted by

top scoring

Click to highlight new comments since: Today at 6:56 AM

[-]gustaf2mo130

Crossposting Eliezer's comments.

Thread between Eliezer and Vitalik

Eliezer Yudkowsky replying on Twitter:

This applies way beyond mere ethics, though! As a kid I trained myself by trying to rationalize ridiculous factual propositions, and then for whatever argument style or thought process reached the false conclusion, I learned to myself: "Don't think that way."

Vitalik Buterin:

Indeed, but I think ethics (in a broad sense) is the domain where the selection pressure to make really powerful galaxy brain arguments is the strongest.
Outside of ethics, perhaps self-control failures? eg. the various "[substance] is actually good for me" stories you often hear. Though you can model these as being analogous, they're just about one sub-agent in your mind trying to trick the others (as opposed to one person trying to trick other people).

Eliezer Yudkowsky:

Harder training domain, not so much because you're more tempted to fool yourself, as because it's not clear-cut which propositions are false. I'd tell a kid to start by training on facts and make sure they're good at that before they try training on ethics.

Vitalik Buterin: Agree!

Eliezer Yudkowsky:

Eg:
Argue: That a saltshaker is Jesus Christ.
Rationalization: Gurer jrer n ybg bs ngbzf va Wrfhf'f obql naq fbzr bs gurz ner ab qbhog va guvf fnygfunxre, tvira Nibtnqeb'f ahzore naq gur ibyhzr bs Rnegu'f fhesnpr.
Lesson: Look at what your thoughts did there; never do that.

Eliezer Yudkowsky used ROT13 to hide an example Rationalization.
After thinking about it yourself, hover over the box below, to reveal his example.

There were a lot of atoms in Jesus's body and some of them are no doubt in this saltshaker, given Avogadro's number and the volume of Earth's surface.

Related writing by Eliezer

Skill: Anti-rationalization (1): Prevent your mind from selectively searching for support of only one side of an argument.

Subskill: Notice the process of selectively searching for support, and halt it. Chains into "Ask whether, not why", or into trying to unbind the need to rationalize. Level 1, notice the active process after performing a conscious check whether it's running; Level 2, detect it perceptually and automatically; Level 3, never run this process.

Exercise material: List of suggested conclusions to rationalize. I'm not quite sure what the prerequisites here would be; one of my own childhood epiphanies was finding that I could argue reasons why there ought to be a teapot in the asteroid belt, and thinking to myself, "Well, I better not do that, then." I suspect the primary desideratum would be more that the topic offers plenty of opportunity to come up with clever rationalizations, rather than that the topic offers motivation to come up with clever rationalizations.
Exercise A: Pick a conclusion from the list. Come up with a clever argument why it is true. Notice what it feels like to do this. Then don't do it.
(This is distinct from noticing the feeling of being tempted to rationalize, a separate subskill.)
Exercise B: In the middle of coming up with clever arguments why something is true, stop and chain into the Litany of Tarski, or some other remedy. Variant: Say out loud "Ew!" or "Oops!" during the stop part.
Exercise C: For a conclusion on the list, argue that it is true. Then "Ask whether, not why" - try to figure out whether it is true. Notice the difference between these two processes. Requires a question whose answer is less than immediately obvious, but which someone can find info about by searching their memory and/or the Internet.

https://www.lesswrong.com/posts/f4CZNEHirweN3XEjs/teachable-rationality-skills?commentId=F837zHgkrTA2rqusn

I learned at a very young age not to be impressed by the sleight-of-hand that assembles a clever-sounding argument for why there might be a teapot in the asteroid belt, even or especially when it was my own brain generating the argument; that was where my journey began, with the search for a kind of thinking that couldn't argue impressively for any conclusion.

https://www.reddit.com/r/rational/comments/dk4hh8/comment/f4po393

When I was a kid I trained myself by coming up with the best arguments I could for propositions I knew to be false, then contemplating the arguments until they no longer seemed persuasive. I picked ridiculous targets.

https://x.com/ESYudkowsky/status/1900621719225987188

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

109

From Vitalik: Galaxy brain resistance

109

109

Thread between Eliezer and Vitalik

Related writing by Eliezer