Hyperstition as the Natural Enemy of Rationality

alseph

If the box contains a diamond,
I desire to believe that the box contains a diamond;
If the box does not contain a diamond right now, but will contain a diamond if I believe there is a diamond,
Uh...

Holding unfounded beliefs might sometimes, because of the causal power of belief, produce better outcomes than being rational. This post was inspired by a couple cases where this phenomenon seems hand-waved away in the Sequences.

"Diseased Thinking"

In this essay, Scott suggests that a consequentialist model deals with the question of whether to moralize issues like obesity better than a definitional argument over whether it is a "disease" or not. If it benefits the person, you moralize; otherwise you let them resort to medical interventions guilt-free.

But there's this annoying feature of morality where most people feel like it has to be absolute to be worth acting on.^[1] You can't just say "we should only guilt people if it would benefit them". The person is either guilty or not guilty; you can't pragmatically decide whether they're guilty or not. The consequentialist frame debuffs the power of moral pressure.

Some individuals, who would have gotten their act together if everyone bought into the old-fashioned guilt and willpower model, will now take a medical way out, making them subject to side effects from the medication or procedure. On net, this could outweigh the benefit of lifting guilt from those for whom willpower is not the deciding factor. The consequentialist framework might actually produce worse equilibria than the traditional one.

What I'm getting at is that optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution. This creates an inherent tension where conservatives reach better equilibria because they can believe in things like God or that a marriage is a divinely sanctioned mutual partnership.

Imagine a society where every member believes they will be punished eternally if they intentionally harm anyone else. If they all genuinely believe this and act according to it, nobody will ever hurt each other. How can you enter such a state? You can't do it by reasoning about the virtues of such a scenario, because that desirable state hinges on everyone actually believing in the god; belief-in-belief isn't sufficient.^[2]

This puts rationalists in an awkward situation, where we can explain why the conservatives are happy, perhaps more accurately than they can, but can never achieve the same results. We become, in some sense, a barrier to that outcome being implemented throughout all of society.

"Why Our Kind Can't Cooperate"

Yudkowsky writes:

"Let's say we have two groups of soldiers. In group 1, the privates are ignorant of tactics and strategy; only the sergeants know anything about tactics and only the officers know anything about strategy. In group 2, everyone at all levels knows all about tactics and strategy.
Should we expect group 1 to defeat group 2, because group 1 will follow orders, while everyone in group 2 comes up with better ideas than whatever orders they were given?
In this case I have to question how much group 2 really understands about military theory, because it is an elementary proposition that an uncoordinated mob gets slaughtered.
Doing worse with more knowledge means you are doing something very wrong. You should always be able to at least implement the same strategy you would use if you are ignorant, and preferably do better. You definitely should not do worse. If you find yourself regretting your "rationality" then you should reconsider what is rational."

But the "too many cooks" aphorism exists for a reason. The above only applies if all the agents are unembedded and can take the optimal strategy with no disadvantage. Instead, in reality, they are embedded, and precious effort/energy/time is lost if the soldiers independently compute the strategy and debate about the best one, or even if the privates quietly obey their superiors' orders but question them in their hearts. People are useful for things other than making good decisions. It’s easier to have some people be mute limbs that blindly trust what the head tells them without needing to understand, agree with, or even hear its arguments.

Self-Confidence

Imagine that the more you believe in yourself, the more successful you will be in your career. Therefore, to optimize for success, you want to believe in yourself as much as possible.

Let's say that if you believe you can become a millionaire, you'll become a millionaire, but if you believe you can become a trillionaire, you'll end up with a net worth of a hundred billion dollars. In the latter case, your belief is far less accurate, but the result is more desirable.

AI

It seems like some influential people in AI believe that being optimistic about AI will lead to a better outcome, whereas being pessimistic will lead to a worse outcome, on the basis of things like alignment pretraining. This naturally leads to accusations that rationalists are making things worse. But holding optimistic beliefs because it might have positive effects isn't compatible with rationality.

Conclusion

This feels like Newcomb's problem. But Yudkowsky writes about Newcomb's Problem that

Alleged rationalists should not find themselves envying the mere decisions of alleged nonrationalists, because your decision can be whatever you like.

The hyperstition version doesn't look so good:

Alleged rationalists should not find themselves envying the mere beliefs of alleged nonrationalists, because your belief can be whatever you like.

Alas, my belief cannot be whatever I like, and I might be condemned to belief envy--at least until rationality is vindicated. And I must accept that it may never be.

^{^}
Anecdotally. I have had many frustrating conversations about this.
^{^}
In practice, people do manage to enter these equilibria. How? Some people, say the members of a religion, are happy. Their religion tells them to convert outsiders. They find an outsider, invite them to eat and have fun with them, and show off how happy they are. The outsider is intrigued. The insiders' appeals to tradition and spiritual claims start resonating with the outsider, and they really want to become part of the group. Eventually some sort of psychic transition happens and they make an emotional proclamation of faith. Now they're part of the happy equilibrium. This can only happen if the emotional, social, and intuitive appeal far outweighs any skepticism.

I think you are conflating hyperstitions (self fulfilling beliefs, like the value of a currency or your diamond example), with false beliefs that have benefits (your society that believes in eternal punishment). While the latter is incompatible with epistemic rationalist, the former doesn't have to be.

Hyperstitions are a powerful form of social technology. Rationalist hyperstitions seems like a valuable potential area of study.

The trillionaire example is both, while the millionaire example is is only a hyperstition, but they seem equally at odds with epistemic rationality.

Hyperstitions are just special cases of epistemically irrational but instrumentally useful beliefs (like believing in God in Pascal's Wager, which is not a hyperstition), which would make them instrumentally rational -- if beliefs were actions which we could pick according to their expected utility. But beliefs aren't actions, we can't decide what we believe, doxastic voluntarism is false.

Hyperstitions are a powerful form of social technology.

Just like you have one-boxers and two-boxers in Newcomb's dilemma, you will have people who support hyperstition as a technology, and people who oppose it "on principle". To use the technology effectively, you may need to separate the former from the latter, and only believe (at least in version 1.0 of the belief) that the former will act accordingly. (With some luck, you may get a version 2.0 saying that when the latter see the success of version 1.0, many of them will join, too.)

Similarly, the benefits of actions mandated by religion first happen to the actual believers, and only later spread to the society around them.

You can't do it by reasoning about the virtues of such a scenario, because that desirable state hinges on everyone actually believing in the god

there's a sleight-of-hand, here. presumably the state is desirable because of the result (nobody ever intentionally harms anyone else), and not the method (fear of god).

while the particular method may be the domain of faith, i see no reason why the resulting harmonious society cannot be reached by clear-thinking means.^[1]

^{^}
well, hyperrationality is a -stition as well. but maybe we can allow this "dry" hyperstition, even if we find narratively interesting metaphysics inaccessible.

Perhaps we should adopt that word "faith" as a shorthand for "hyperrationality", and similarly "grace" as a shorthand for "when hyperrationality works; the positive outcome of successful hyperrational coordination."

In this usage, faith isn't "believing what you know ain't so"; it's acting as if a positive belief is true, in a way that coordinates to cause it to be true. It is cooperating with others on making it true. Grace is the resulting condition of the belief actually being true; not by miracle, but because enough people were in fact faithful.

As in:

Alice faithfully brought a dish to the community potluck; and by grace there was plenty of food for everyone. (Alice acts according to the belief that "we're all bringing food", and since pretty much everyone else does too, there is indeed enough food.)
If there's a COVID outbreak, then I will test myself before going to the party, out of faith that others similarly situated to me may do likewise; so that by grace nobody will get exposed at the party. (If I faithlessly skip doing the test, then I should expect others to skip it too; no grace for us.)
By grace we were saved from nuclear war throughout the Cold War; and so we remember the faith of Saint Stanislav Petrov, whose belief "we are not having a nuclear war" led to us not having a nuclear war.

What I’m getting at is that optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution.

Unfortunately, it's impossible to have a system where you have people believe unfounded things when it benefits them, but not otherwise. If people set up the unfounded beliefs themselves, they are incentivized to set up unfounded beliefs that are in their own interest. If you let the unfounded beliefs evolve by memetic fitness, you'll get unfounded beliefs that are memetically fit for reasons other than being good for people.

This naturally leads to accusations that rationalists are making things worse.

This seems like a misunderstanding of either the claims made in alignment pretraining or how labs would adopt this technique in practice. We actually don't recommend filtering any negative information about AI systems from pretraining. Rather, we find upsampling positive techniques is much more effective at changing alignment propensities. Bad data can also lead to good models, and it's reasonable to think that having a concept of misalignment is beneficial for post-training.

Also, alignment midtraining techniques like model spec midtraining don't use filtering at all, and I don't know of any evidence that labs should or are currently filtering LW style data.

I agree that their posaibly are situations where belief influences outcomes in such a way that factually incorrect beleifs can produce better outcomes. For the most part I think these situations are comtrived and rare, but accept they do exist.

However, in real life, we don't encounter one choice ever, but many choices one after the other. In the long run its not about how good the initial descion making is, but what we learn. Something goes well, we ask what we did right. Something goes wrong, we ask what we could have done better. Learning well is the long term best strategy.

If I have somehow adopted wrong beleifs on purpose how can I ever assess if this is working or not? I preumably no longer know that those beleifs are wrong.

optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution

The fact that we are here discussing this is proof by demonstration that rationality can notice the problem.

The (supposed) fact that we cannot immediately implement the obvious solution is a flaw in the structure of the human mind, which is a problem rationality can solve, with sufficient effort, should we decide to do so.

The assertion that there is no other solution available using our current minds does not (AFAICT) follow, and relies on facts not currently in evidence.