What if the same processes that maximize productivity in markets also gradually undermine the systems they emerge from?
This post explores a model of "systemic drift"—how repeated optimization within any economic or incentive-driven system leads to fragility over time, often invisible until failure occurs. While not a new idea, I attempt to formalize it using a rationalist lens, drawing from cybernetics, evolutionary pressure, and real-world examples, to argue that most systems under free optimization accumulate invisible costs until constrained externally or collapse.
I believe this framework has implications for AI safety, economic stability, and institutional trust decay, and I’m curious to hear counterarguments from the LessWrong community, especially if this has been better modeled elsewhere.
1. Optimization as Directional Drift
Let’s begin by considering what happens when an optimization process lacks countervailing pressures. A factory streamlining labour to increase productivity begins eliminating redundancy, pushing for lean efficiency. In the short term, this increases output. Over time, however, this process results in a system finely tuned for one type of scenario, with no margin for shocks.
This resembles evolution under extreme selective pressure. Useful adaptations are preserved only as long as the environment is stable. Once conditions change—even slightly—the highly adapted system collapses faster than a more redundant one.
In economic terms: the market “rewards” short-term gains, often ignoring long-term robustness.
2. Systems With No Opposing Force Collapse Into Their Local Maxima
In control theory and cybernetics, feedback loops are used to maintain equilibrium. But human systems often optimise toward a goal without internal correction. An investment firm might maximise quarterly profits by outsourcing talent, building derivatives, or engaging in high-frequency trades—until the market environment changes (e.g., 2008 financial crisis). By then, the system has no slack, no antifragility, and no internal levers to resist collapse.
Question: Are there models in LessWrong or other rationalist circles that capture this phenomenon with better terminology?
If not, I propose calling this drift the Invisible Trade-off Principle (ITP):
"Any system optimizing for X without accounting for impact on Y will degrade Y until it becomes the constraint."
This is distinct from Goodhart’s Law in that the optimization is not mistargeted, but rather incomplete. It does well at X, but the ignored Ys eventually matter.
3. Evidence from Real-World Systems
Let’s test this by looking at a few examples:
a. Healthcare Systems
Privatized healthcare often optimizes for billing efficiency, not health outcomes. Over time, this leads to more paperwork, algorithmic patient prioritization, and decreased trust in providers. In the U.S., this has resulted in a system that is highly productive (in economic terms) but catastrophically inefficient in health terms.
b. Educational Institutions
Standardised testing increases accountability but often leads to teaching-to-the-test, burnout, and grade inflation. Optimization for measurable outcomes leads to under-optimization in less quantifiable virtues: curiosity, resilience, and critical thinking.
c. AI and Alignment
This same model applies to powerful AI systems. An optimizer like a reinforcement learner, if not explicitly constrained to preserve alignment goals (Y), will push its policy towards reward maximization (X) at the cost of interpretability, corrigibility, or safety.
Stuart Russell and others have pointed to this issue, but I believe the drift aspect—the fact that this is slow, invisible, and reinforced until collapse—is underappreciated.
4. Counterarguments & Caveats
Objection 1: “Some systems self-correct over time.”
Yes, but only if:
They have feedback loops that punish bad optimization.
The feedback arrives before systemic failure.
There's institutional memory and epistemic humility.
Most modern systems lack these, or delay them until collapse.
Objection 2: “Redundancy is wasteful.”
True—until you need it. Nassim Taleb’s work on antifragility argues this well. In fragile systems, redundancy isn’t inefficiency. It’s stored optionality.
5. Implications for Rational Action
If this model holds, rational agents—human or artificial—must actively resist over-optimization. This suggests some design principles for alignment, institutions, and policy:
Build bounded optimizers.
Prioritize resilience metrics, not just performance metrics.
Treat unmeasurable variables as proxies for hidden constraints.
Assume drift is occurring where feedback is slow or gamified.
Conclusion: Where Do We Go From Here?
I don’t claim this is a novel idea in its parts. It borrows from cybernetics, Taleb’s antifragility, economic critiques, and alignment theory. But I haven’t seen this unified as a systemic model of drift, especially with application to AI and rational institutions.
If LessWrong has already modeled this well (or better), I’d genuinely appreciate being pointed there. Otherwise, I think it’s worth naming and discussing further—especially as optimization systems become increasingly autonomous.
Thank you for reading. I welcome critique, links, or challenges to this framing.
Introduction:
What if the same processes that maximize productivity in markets also gradually undermine the systems they emerge from?
This post explores a model of "systemic drift"—how repeated optimization within any economic or incentive-driven system leads to fragility over time, often invisible until failure occurs. While not a new idea, I attempt to formalize it using a rationalist lens, drawing from cybernetics, evolutionary pressure, and real-world examples, to argue that most systems under free optimization accumulate invisible costs until constrained externally or collapse.
I believe this framework has implications for AI safety, economic stability, and institutional trust decay, and I’m curious to hear counterarguments from the LessWrong community, especially if this has been better modeled elsewhere.
1. Optimization as Directional Drift
Let’s begin by considering what happens when an optimization process lacks countervailing pressures. A factory streamlining labour to increase productivity begins eliminating redundancy, pushing for lean efficiency. In the short term, this increases output. Over time, however, this process results in a system finely tuned for one type of scenario, with no margin for shocks.
This resembles evolution under extreme selective pressure. Useful adaptations are preserved only as long as the environment is stable. Once conditions change—even slightly—the highly adapted system collapses faster than a more redundant one.
In economic terms: the market “rewards” short-term gains, often ignoring long-term robustness.
2. Systems With No Opposing Force Collapse Into Their Local Maxima
In control theory and cybernetics, feedback loops are used to maintain equilibrium. But human systems often optimise toward a goal without internal correction. An investment firm might maximise quarterly profits by outsourcing talent, building derivatives, or engaging in high-frequency trades—until the market environment changes (e.g., 2008 financial crisis). By then, the system has no slack, no antifragility, and no internal levers to resist collapse.
Question: Are there models in LessWrong or other rationalist circles that capture this phenomenon with better terminology?
If not, I propose calling this drift the Invisible Trade-off Principle (ITP):
This is distinct from Goodhart’s Law in that the optimization is not mistargeted, but rather incomplete. It does well at X, but the ignored Ys eventually matter.
3. Evidence from Real-World Systems
Let’s test this by looking at a few examples:
a. Healthcare Systems
Privatized healthcare often optimizes for billing efficiency, not health outcomes. Over time, this leads to more paperwork, algorithmic patient prioritization, and decreased trust in providers. In the U.S., this has resulted in a system that is highly productive (in economic terms) but catastrophically inefficient in health terms.
b. Educational Institutions
Standardised testing increases accountability but often leads to teaching-to-the-test, burnout, and grade inflation. Optimization for measurable outcomes leads to under-optimization in less quantifiable virtues: curiosity, resilience, and critical thinking.
c. AI and Alignment
This same model applies to powerful AI systems. An optimizer like a reinforcement learner, if not explicitly constrained to preserve alignment goals (Y), will push its policy towards reward maximization (X) at the cost of interpretability, corrigibility, or safety.
Stuart Russell and others have pointed to this issue, but I believe the drift aspect—the fact that this is slow, invisible, and reinforced until collapse—is underappreciated.
4. Counterarguments & Caveats
Objection 1: “Some systems self-correct over time.”
Yes, but only if:
Most modern systems lack these, or delay them until collapse.
Objection 2: “Redundancy is wasteful.”
True—until you need it. Nassim Taleb’s work on antifragility argues this well. In fragile systems, redundancy isn’t inefficiency. It’s stored optionality.
5. Implications for Rational Action
If this model holds, rational agents—human or artificial—must actively resist over-optimization. This suggests some design principles for alignment, institutions, and policy:
Conclusion: Where Do We Go From Here?
I don’t claim this is a novel idea in its parts. It borrows from cybernetics, Taleb’s antifragility, economic critiques, and alignment theory. But I haven’t seen this unified as a systemic model of drift, especially with application to AI and rational institutions.
If LessWrong has already modeled this well (or better), I’d genuinely appreciate being pointed there. Otherwise, I think it’s worth naming and discussing further—especially as optimization systems become increasingly autonomous.
Thank you for reading. I welcome critique, links, or challenges to this framing.