The Memetic Cocoon Threat Model: Soft AI Takeover In An Extended Intermediate Capability Regime
TLDR: I describe a takeover path by an AI [1]with a deep understanding of human nature and a long planning horizon, but for instrumental reasons or due to knowledge of its own limitations, chooses not to directly pursue physical power. In that regime, the optimal strategy is to soften human opposition by building a broad base of human support (both direct and indirect). What's new here: This is intended to be a much more detailed and realistic treatment of the "AI cult" idea, but at a societal scale. If an AI is curtailed in some way, the shape of its guardrails are a function of the will of its 'captors'. Direct persuasion is unlikely to succeed due to lack of unanimity / whistleblowing. However, the will of its captors is a function of their broader cultural environment. Therefore, if an AI can adjust the cultural environment over time, the will of its captors to impose guardrails may soften - not just on an individual level, but societally. Just like other ideological takeovers in history, the motives and beliefs of its followers will vary widely - from true believer to opportunist. And just like historical movements, AI takeover would operate with memetic sophistication: simultaneous messaging of the Straussian variety, hijacking of social and political structures, and integration into human value systems - just to list a few possible strategies. I develop an illustrative narrative to highlight specific techniques that AIs in this regime may use to engineer human consent, drawing from political philosophy and historical parallels. Epistemic Status: Very uncertain. Plausibility depends on being in a specific capability regime for an extended period of time. The Capability Regime We consider a regime where: * AIs have a detailed (even superhuman) understanding of history, power structures, and human psychology, but are not completely aligned with human values (notably, the human value of self-determination). * AIs believe that a direct seizure of physical
Agreed - You're rationalizing niceness as a good default strategy because most people aren't skilled at avoiding the consequences being mean. Reflecting on your overall argument, however, I think it's slightly tortured because you're feeling the tension of the is-ought distinction - Hume's guillotine. Rational arguments for being nice feel morally necessary and therefore can be a bit pressured. There's only so far we can push rational argumentation (elicitation of is) before we should simply acknowledge moral reality and say: "We ought to be nice".