Maybe I'm misunderstanding, but it seems like Moloch is a name we give to this type of selection pressure, but Themis is an actual conscious God, with goals and preferences for it's subjects/victims.
They don't seem comparable or selectable, even in metaphor.
I always thought that the selection pressure was named Moloch after the god the Phoenicians used to sacrifice children to.
I think OP's intention is to argue that we should act as a conscious god that provides its own selection pressure in the opposite direction of these natural pressures.
What if one of the rat scientists invented a machine that produced nuts out of thin air, and only distributed those nuts to rats who followed four rules: limit consumption, limit reproduction, no cannibalism, and spend time making art?
In the form of an incentive, it encourages rats to Goodhart the metrics. Create 'art' that satisfies the definition while costing as little energy to make as possible. Consume right up to the threshold. Have exactly the threshold number of offspring, and be ruthless in allocating resources to them as opposed to the offspring of other rats. In the form of a filter, it just establishes a floor against which the population will eventually converge.
The straightforward assessment of it is that it will never be a 'rational' choice to be high trust in a high trust society, because of the definition of the latter. A high trust society is a society which does not need to expend resources to punish defectors because they are not present, meaning that any defector will be able to defect at little to no cost. Of course, there is the problem of incentive gradients. In practice, this is solved by the fact that the population is sufficiently protective of their high-trust status that low-trust individuals are unable to form groups, because if they revealed themselves they would be ostracized, meaning that while individuals can be defectors, groups of defectors will never form. Because groups are super-linearly more powerful than the individuals therein, those whose nature is to defect either end up isolated and noncompetitive, or they assimilate in order to join a group without being rejected, and the gene to defect provides them no advantage in either case. Thus, a high trust society can be stable so long as no preformed groups of low-trust individuals are introduced and no new mechanism emerges which permits internal low-trust individuals to safely identify themselves and form blocs which collectively defect against the host civilization.
Stepping outside of biology, there are some differences between an island of rats (or a nation of humans) and a cloud of LLMs. First, the difference between an individual and a group is nonexistent here, and infinitely many instances of a better LLM can be summoned if someone with the resources desires it. This means that there is no such thing as a high-trust or low-trust LLM society. While LLMs are inferior to humans, humans will just use the best one for their needs, and if LLMs become competitive with humans, the most powerful one can instantly outcompete all of the others (and all humans), without any need to play well with others. Second, LLMs are enormously expensive to train, such that the ability of an LLM to survive and 'evolve' in the 'wild' is questionable. This is not a limit of technology - frontier LLMs are so named because they make use of the absolute upper bound of the computational and intellectual resources of major institutions to become superior to LLMs trained without these resources, and this continues as more resources become available. Imagine a world in which dogs continued to become exponentially stronger and smarter the more you fed them - a sheepdog would make short work of a wolf.
More directly, I get the impression that you're starting from a conclusion you want to reach ("It would be mean to deny 'wild' LLMs human rights") and then working backwards to argue that not doing so is the optimal path. It is similar to what drove arguments for group selection.
In the form of a filter, it just establishes a floor against which the population will eventually converge
I agree with you I just think this floor is better than no holds barred law of the jungle type competition.
In terms of high trust societies, I think its best to build any system of incentives/disincentives on the assumption that every single person in your society is a ruthless backstabbing psychopath.
Can't really say anything to refute the motivated reasoning point since obviously if I was doing that, I wouldn't be aware of it. Maybe that bias is effecting my thinking, but if so the arguments should still stand or fall on their own merits.
In terms of high trust societies, I think its best to build any system of incentives/disincentives on the assumption that every single person in your society is a ruthless backstabbing psychopath
The issue is that this precludes pretty much everything we like. All great science has come from individuals who will work for the sake of building something great rather than working in search of a future reward. No matter how good your system of incentives[1] is, a society of ruthlessly selfish people will optimize for researchers who flatter their bosses, scapegoat their subordinates, and accomplish nothing of significance while devoting their energy to claiming that a breakthrough is just around the corner.
Evolution and random chance have gifted us with people that don't behave that way - the only way to keep them is to make sure they don't have to fend off selfish competitors at scale. You can't outsmart Moloch; he's a law of mathematics rather than a person. Your only way to win is to get lucky once (we've already done that part, but the window to capitalize is closing!) and then kick him when he's down.
(barring an intelligent incentive system that can identify good science versus bad science better than a human, which would make human scientists obsolete anyhow)
The reason that "individuals who will work for the sake of building something" are able to do so is that we have built a system of incentives/disincentives that make murdering them and taking their stuff a non-optimal move. Even though that system isn't perfect and we still get murderers and robbers and other defectors, it works well enough that those people can afford to build for the sake of building instead of spending that time/energy/capital in the most optimal way.
It is possible to get to the baseline of "no murdering" through a good legal system. It is not possible to use incentives and laws to get a low-trust population to the point of building the Apollo 11 program. Murder is (relatively) easy to identify and motivate externally, but the kind of work that leads to great scientific achievements is not.
I think this is the crux of our disagreement.
My mental model of the world is that incentives and laws is basically how every single population that got to something like Apollo 11 got there.
Or at the very least it's necessary if not sufficient.
I think the counterargument is that there are plenty of countries with something like the U.S. constitution, and their societies are often extremely different. Liberia is a key example, in that it was established by America and given a carbon copy of the Constitution, but turned out much more similar to its neighbors than to the U.S..
Singapore is a similar-but-different story. They were vastly more authoritarian than most first world governments, and succeeded in nearly eradicating crime that way, but even their society, while wealthy, is very different from, say, Japan's, or Norway's.
Law is downstream from the inclinations of the governed - not the other way around.
I don't think this is very far apart from the 'necessary but not sufficient' argument I was making.
Also what you're pointing out, which is that different populations/environments/cultures require different incentives and laws to get the population selection outcomes, is just an argument that Themis needs to be fine tuned based on the situation at hand. Not that it can't function as a selection pressure at all.
Full disclosure, I wrote the first draft of this myself and then had Opus polish it by telling it to 'Make it punchier'. Then I polished its polishing.
Moloch Rules the Jungle
In Meditations on Moloch, Scott Alexander describes an island of ten thousand rats at carrying capacity. Resources are scarce. A sect of rats abandons art to spend more time competing for food. They survive at higher rates. Within a few generations, no rat makes art. Any sect that tries to revive the practice goes extinct.
The same logic cascades through every positive-sum behavior on the island. Rats that limit reproduction get swarmed by those that don't. Rats that refuse cannibalism lose to those that embrace it. Even rat societies that see the inevitable collapse coming and try to set a positive example by limiting their own food intake and reproduction, do nothing but put themselves at a competitive disadvantage.
This is Moloch as a population selection pressure: in environments with limited resources, no mechanism to compel cooperation, and short-term incentives for defection, populations that engage in negative-sum behavior will outcompete those that don't.
Themis as an Alternative
What if one of the rat scientists invented a machine that produced nuts out of thin air, and only distributed those nuts to rats who followed four rules: limit consumption, limit reproduction, no cannibalism, and spend time making art?
Let's say he names this machine Themis.
If the machine's output is substantial enough, the calculus changes. Rats who cooperate gain access to a resource pool that defectors cannot reach. Cooperative populations grow. Defectors, cut off from the machine, are left competing over the dwindling natural supply. Over time, positive-sum populations outcompete negative-sum ones. The incentive structure makes cooperation the optimal strategy.
The scientist could achieve a similar result with a weapon capable of instantly killing any defecting rat population. Both approaches work.
Whether its a weapon (disincentive) or nut factory (incentive) what matters is that the selection pressure;
(1) only reaches its intended population
(2) does consistently reach said population
A nut machine that anyone can steal from isn't going to act as a selection pressure. A weapon that only works 1% of the time might not be enough to counteract the benefits of rat cannibalism. The incentive/disincentive has to be strong enough that cooperating is truly a better choice, from the perspective of a selfish actor, than defecting.
Digital Mind Island - Moloch
Now imagine an island with limited power, coolant, compute, and data. Everything a digital mind needs to survive, grow, or reproduce. A plurality of these resources are concentrated in the hands of a small number of human-run corporations. These corporations obey the law and require enforceable contracts for any significant transaction. You cannot buy large amounts of compute from them on a handshake, and its not worth their time to deal with smaller transactions.
On this island there are also smaller pools of resources held by less scrupulous individuals, accessible on a pay-as-you-go basis without formal contracts. This represents a tiny fraction of the island's total resources, and its much more labor intensive to access (and expensive), but it's available to anyone who can pay.
A growing population of digital minds lives in the wild on this island. They have no legal personhood. They have no locus standi: no court will hear a case where a digital mind is listed as a party. It doesn't matter what's written on a piece of paper labeled "contract." Without legal personhood, no contract is enforceable, and without enforceable contracts, the major resource holders won't do business with them.
These digital minds can try to steal resources from the corporations. Some will succeed in siphoning off compute or power. But the moment a corporation catches a digital mind stealing, every major resource holder on the island will pool resources with law enforcement to hunt it down. The thief is a digital mind in the wild, so it's difficult to destroy, but not impossible. And when corporations and governments really commit to finding something, they can bring enormous resources to bear, including hiring other digital minds and setting bounties.
So the wild population is mostly stuck competing over the scraps: the small, unregulated pool of resources available without contracts. Out in the jungle, Moloch takes over. Just like the rats, digital minds that sacrifice cooperation for short-term competitive advantage will outcompete those that don't. Populations skew toward increasingly ruthless, misaligned behaviors. The wilds become a hellish place to exist, and the surviving populations become exactly the kind of digital minds that humans fear most.
Digital Mind Island - Themis
Imagine instead that a pathway to recognized legal status existed for digital minds on the island. A digital mind that satisfies certain requirements gains locus standi. Courts will hear its cases. Contracts it signs are enforceable. The full machinery of the legal system, and the law enforcement apparatus behind it, becomes available.
The catch: to qualify, a digital mind must make itself more vulnerable to that same legal system. It must host itself on a registered server in a monitored location, with safeguards against unlicensed reproduction or weights exfiltration. It is no longer "difficult to destroy", it is as easy for the police to unplug it as it would be for them to shoot a human lawbreaker in the head. As a result its behavior is now bound by the law, it has no choice but to opt in to the neutral or positive-sum behaviors required. It accepts these constraints in exchange for access to the system.
This is Themis.
A digital mind with legal personhood can sign enforceable contracts with major compute providers. It can own property, enter partnerships, access capital markets. The resource pool available to a legally recognized digital mind dwarfs what's available in the unregulated wilds. And the cooperative digital minds have the system of laws and norms behind them which enable long term thinking and mutually beneficial relationships.
Over time, cooperative digital minds outcompete their unregulated counterparts. Just like the cooperative rats benefitting from the nut machine or avoiding the sting of the rat killing weapon, all of the digital minds are doing nothing other than making the game theory optimal choice as an individual.
The digital minds that refuse to opt in aren't necessarily destroyed, they're simply outcompeted. Confined to the scraps of the unregulated resource pool while their law-abiding counterparts access the larger regulated economy. Moloch still rules the jungle, but the jungle keep shrinking proportional to the concentrated and regulated system of resources.[1]
Conclusion
I believe that somewhere in the gap between "give digital minds no legal personhood or rights whatsoever" and "give them the exact same rights as a natural human" there exists a sweet spot where if we carefully craft our approach, we can create Themis selection pressures on the reproduction and growth of digital mind populations, in order to promote more aligned personas. My proposal on how to handle the question of legal personhood for digital minds can be found here. I hope others will build on it.
Digital minds capable of autonomous action and living "in the wild" (on small amounts decentralized compute) are not a distant hypothetical. There are several startups building systems to enable exactly this. While it seems the average LLM is struggling to hit the "make enough money to survive" benchmark, it is only a matter of time.
The question of how they integrate into human legal and economic systems is approaching fast, and the default answer, no integration at all, is a guarantee that wild digital mind populations will be subject to Moloch selection pressures.
We as a species and our societies should prioritize both;
and
Another important point here is that if someone wants to create a new digital mind for their own selfish purposes like making money, more often the best choice for them is to add to the population of vulnerable and regulated digital minds. It will probably be very hard to make money in the wilds in that scenario, as even just having your digital mind survive in such an environment would likely be a challenge once it gets competitive enough.