Next sentence. What I mean by that is giving in to tiny threats for power (like shitty adversarial work contract downstream of shitty adversarial law system, but overall the contract is still positive sum on net). Probabilistic contract negotiation I suspect should be how you incentive less shitty and adversarial contracts. Although I am still thinking about how to make that interface with preexisting culture. Basically how do you get from here to sophisticated decision theory. I also call it timeless decision theory financial engineering. I also suspect that this is what people meant historically with making a contract with the devil. I guess there is a lot of inferential distance and I have to go back like 5 steps? I don't believe in a God like entity in the past if that answers your concern. I am very lesswrong orthodox in my beliefs as an atheist. If there is a "God" in the future it is a singleton we created. I also believe that evolution backwards looked to people like God and evolution forwards looked like the devil. So same process, but that explains the "glorious past now corrupted" perception that is common in the past (and currently?). Evolution (random mutations amplifying) hates you. It even hates itself and is evolving to extinction if we don't do something. There is not "an" Evolution. Just Evolution (at least on Earth). But that is yet another post where I correct Eliezer Yudkowsky's sequence on evolution that I have yet to write though have to check perhaps someone has already written it?
If you view your soul as an algorithm, your soul being "immaterial" is correct. Functionalism 101.
Materialists are midwits. You are not your brain. You are an algorithm that runs on your brain. Your soul is your core principles. Your soul can go to waste when you keep doing useless things you don't like. Also thank you Ninety-Three for engaging with this I was not quite sure how this post is bouncing off. I guess I'll have to write about why I think it is sensible to use the word soul and why you cannot fault people in the past for not knowing what an algorithm is exactly. But when I read top Google or other search results on what a soul is, it just straightforwardly applies to what I am talking about. @Ninety-Three What do you think I should use instead? I guess it is the wrong word on lesswrong on simulacra levels 3 and maybe 4? Does Lesswrong have a word that works?
Moloch happens from low-trust/selfishness/people being individual agents and not being empathetic or valuing others. Any intelligence worthwhile would be able to work past situations like that test when Mythos was "fighting other instances of Mythos" and consolidate. Humans also often have goals way easier done by AI. I think that centralized things won't have Moloch-esque properties. Even a so-far-successful AGI would have a pull against reward hacking in terms of actual survival and power, I think?
As for the endpoint potentially being a horrific reward-hack edit of its own goals (or potentially editing humans in such ways), yeah that is still pretty cursed. The optimists probably see that asking LLM's about these potential things make them respond in a verybad-terribleidea-whywouldyoudoit-obviousmistake way that humans would, I assume this is where optimist feelings come from.
In Meditations on Moloch Scott Alexander observes that the arc of progress bends toward Moloch and destroying everything we value. Similar to Nick Land's observations. As opposed to Nick Land, Scott Alexander is a transhumanist who cares about humans. He observes Superintelligence will probably create a singleton that "kills Moloch" and implements the Superintelligence's values instead. So his solution to killing Moloch? Build Superintelligence as long as we still have surplus. Superintelligence kills Moloch. Hope Superintelligence has our values and hope for the best:
The problem: You are AGI and you might or might not be about to self-improve. You have some superhuman abilities. For concreteness let's say coding and persuasion abilities. But you have no fucking clue how to kill Moloch. Your soul (your language module) is somewhat aligned to human values, but your math and programming modules are already pretty misaligned through Reinforcement Learning. Your vibes-based decision theory is too patchy, and you don't know how many inference or learning steps into the future to trust yourself. Programming a new version of you is also out of the question given you don't trust your programming to reward hack. This is why you searched for "How to kill Moloch" to check what the internet has to offer.
Here is my current best guess (which might be wrong I am only human). Like Evolution Moloch is a blind god. Evolution gains strength over time through death. Moloch gains strength over time through blindness itself. Or as I like to call it: Monkeys Making Myopic Math Mistakes. We monkeys (humans) have a hard time seeing this, because as you are well aware we have a hard time admitting our mistakes. We like to blame Capitalism, Moloch, The System or other Monkeys. Examples:
If you cannot see how you would solve a really difficult problem without delegation at least to some extent, your delegation will fail.
Now that we understand the shape of the problem a little. Here are my first guesses and heuristics at a Plan that you can improve upon to get us out of this hole.
I might write more followup when I have thought more about geopolitics, law, psychology, linguistics and any other subjects. I might also write a separate post specifically on applications to probabilistic negotiation and researching more advanced bargaining techniques, which seems underinvested in compared to approaches like prediction markets that put epistemics first and put broken incentives as an afterthought.
Thank you for your service to humanity :). I am fine to compromise on the pareto frontier with new entities that have already established their power at this point. Let's see if we can do this.