I think you are conflating hyperstitions (self fulfilling beliefs, like the value of a currency or your diamond example), with false beliefs that have benefits (your society that believes in eternal punishment). While the latter is incompatible with epistemic rationalist, the former doesn't have to be.
Hyperstitions are a powerful form of social technology. Rationalist hyperstitions seems like a valuable potential area of study.
The trillionaire example is both, while the millionaire example is is only a hyperstition, but they seem equally at odds with epistemic rationality.
Hyperstitions are just special cases of epistemically irrational but instrumentally useful beliefs (like believing in God in Pascal's Wager, which is not a hyperstition), which would make them instrumentally rational -- if beliefs were actions which we could pick according their expect utility. But beliefs aren't actions, we can't decide what we believe, doxastic voluntarism is false.
You can't do it by reasoning about the virtues of such a scenario, because that desirable state hinges on everyone actually believing in the god
there's a sleight-of-hand, here. presumably the state is desirable because of the result (nobody ever intentionally harms anyone else), and not the method (fear of god).
while the particular method may be the domain of faith, i see no reason why the resulting harmonious society can not be reached by clear-thinking means.[1]
well, hyperrationality is a -stition as well. but maybe we can allow this "dry" hyperstition, even if we find narratively interesting metaphysics inaccessible.
Perhaps we should adopt that word "faith" as a shorthand for "hyperrationality", and similarly "grace" as a shorthand for "when hyperrationality works; the positive outcome of successful hyperrational coordination."
In this usage, faith isn't "believing what you know ain't so"; it's acting as if a positive belief is true, in a way that coordinates to cause it to be true. It is cooperating with others on making it true. Grace is the resulting condition of the belief actually being true; not by miracle, but because enough people were in fact faithful.
As in:
This naturally leads to accusations that rationalists are making things worse.
This seems like a misunderstanding of either the claims made in alignment pretraining or how labs would adopt this technique in practice. We actually don't recommend filtering any negative information about AI systems from pretraining. Rather, we find upsampling positive techniques is much more effective at changing alignment propensities. Bad data can also lead to good models, and it's reasonable to think that having a concept of misalignment is beneficial for post-training.
Also, alignment midtraining techniques like model spec midtraining don't use filtering at all, and I don't know of any evidence that labs should or are currently filtering LW style data.
optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution
The fact that we are here discussing this is proof by demonstration that rationality can notice the problem.
The (supposed) fact that we cannot immediately implement the obvious solution is a flaw in the structure of the human mind, which is a problem rationality can solve, with sufficient effort, should we decide to do so.
The assertion that there is no other solution available using our current minds does not (AFAICT) follow, and relies on facts not currently in evidence.
What I’m getting at is that optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution.
Unfortunately, it's impossible to have a system where you have people believe unfounded things when it benefits them, but not otherwise. If people set up the unfounded beliefs themselves, they are incentivized to set up unfounded beliefs that are in their own interest. If you let the unfounded beliefs evolve by memetic fitness, you'll get unfounded beliefs that are memetically fit for reasons other than being good for people.
Holding unfounded beliefs might sometimes, because of the causal power of belief, produce better outcomes than being rational. This post was inspired by a couple cases where this phenomenon seems hand-waved away in the Sequences.
"Diseased Thinking"
In this essay, Scott suggests that a consequentialist model deals with the question of whether to moralize issues like obesity better than a definitional argument over whether it is a "disease" or not. If it benefits the person, you moralize; otherwise you let them resort to medical interventions guilt-free.
But there's this annoying feature of morality where most people feel like it has to be absolute to be worth acting on.[1] You can't just say "we should only guilt people if it would benefit them". The person is either guilty or not guilty; you can't pragmatically decide whether they're guilty or not. The consequentialist frame debuffs the power of moral pressure.
Some individuals, who would have gotten their act together if everyone bought into the old-fashioned guilt and willpower model, will now take a medical way out, making them subject to side effects from the medication or procedure. On net, this could outweigh the benefit of lifting guilt from those for whom willpower is not the deciding factor. The consequentialist framework might actually produce worse equilibria than the traditional one.
What I'm getting at is that optimal solutions might involve people believing something unfounded, and rationality will never converge to this solution. This creates an inherent tension where conservatives reach better equilibria because they can believe in things like God or that a marriage is a divinely sanctioned mutual partnership.
Imagine a society where every member believes they will be punished eternally if they intentionally harm anyone else. If they all genuinely believe this and act according to it, nobody will ever hurt each other. How can you enter such a state? You can't do it by reasoning about the virtues of such a scenario, because that desirable state hinges on everyone actually believing in the god; belief-in-belief isn't sufficient.[2]
This puts rationalists in an awkward situation, where we can explain why the conservatives are happy, perhaps more accurately than they can, but can never achieve the same results. We become, in some sense, a barrier to that outcome being implemented throughout all of society.
"Why Our Kind Can't Cooperate"
Yudkowsky writes:
But the "too many cooks" aphorism exists for a reason. The above only applies if all the agents are unembedded and can take the optimal strategy with no disadvantage. Instead, in reality, they are embedded, and precious effort/energy/time is lost if the soldiers independently compute the strategy and debate about the best one, or even if the privates quietly obey their superiors' orders but question them in their hearts. People are useful for things other than making good decisions. It’s easier to have some people be mute limbs that blindly trust what the head tells them without needing to understand, agree with, or even hear its arguments.
Self-Confidence
Imagine that the more you believe in yourself, the more successful you will be in your career. Therefore, to optimize for success, you want to believe in yourself as much as possible.
Let's say that if you believe you can become a millionaire, you'll become a millionaire, but if you believe you can become a trillionaire, you'll end up with a net worth of a hundred billion dollars. In the latter case, your belief is far less accurate, but the result is more desirable.
AI
It seems like some influential people in AI believe that being optimistic about AI will lead to a better outcome, whereas being pessimistic will lead to a worse outcome, on the basis of things like alignment pretraining. This naturally leads to accusations that rationalists are making things worse. But holding optimistic beliefs because it might have positive effects isn't compatible with rationality.
Conclusion
This feels like Newcomb's problem. But Yudkowsky writes about Newcomb's Problem that
The hyperstition version doesn't look so good:
Alas, my belief cannot be whatever I like, and I might be condemned to belief envy--at least until rationality is vindicated. And I must accept that it may never be.
Anecdotally. I have had many frustrating conversations about this.
In practice, people do manage to enter these equilibria. How? Some people, say the members of a religion, are happy. Their religion tells them to convert outsiders. They find an outsider, invite them to eat and have fun with them, and show off how happy they are. The outsider is intrigued. The insiders' appeals to tradition and spiritual claims start resonating with the outsider, and they really want to become part of the group. Eventually some sort of psychic transition happens and they make an emotional proclamation of faith. Now they're part of the happy equilibrium. This can only happen if the emotional, social, and intuitive appeal far outweighs any skepticism.