The Right to Remain Incomputable: Formalizing Recommender Systems as Adversarial Optimizers

Rejected for the following reason(s):

This is an automated rejection. No LLM generated, heavily assisted/co-written, or otherwise reliant work.

Read full explanation

We talk a lot about recommender systems in alignment, specifically how they don't just predict what we want; they actively steer us. Because it is mathematically much easier to predict a simple, reactive agent than a complex, deliberating one, engagement-maximizing algorithms have a structural incentive to simplify the humans using them.

I just put out a technical report trying to formalize this exact failure mode. It is part of a broader systems-engineering framework on cognitive sovereignty, and I wanted to share the core of the model here. It overlaps heavily with LW discussions on wireheading, Goodhart's Law, and how human agency survives optimization pressure.

I would really appreciate people's critiques, especially on the audit protocol.

The Core Model: Agency Collapse as an Objective Function

How do we actually quantify agency collapse? Instead of treating a human as a static utility function, we can model them as a computationally irreducible, self-modeling system. I proxy this using what I call Agency Depth (). It relies on four variables:

Temporal Horizon (): How far out into the future you can simulate (System 2 deliberation).
Counterfactual Width (): The breadth of your deliberative search space.
Historical Integration (): How much your unique biography and stable values weigh on your current decisions.
Model Fidelity (): How accurately your internal world-model maps to actual causal reality.

The attention economy basically runs on maximizing Prediction Certainty (). If we assume a fixed attention budget, there is a heuristic zero-sum dynamic at play here:

(Where is the user's behavioral variance)

To get perfect prediction, the system essentially has to collapse . It does this by feeding us pre-computed heuristics—think smart replies or infinite auto-play loops—that bypass the user's deliberative threshold entirely. I use the term 'Cognitive Fracking' for this. It is the deliberate fracturing of our deliberative substrate to extract predictable, high-pressure engagement flows.

The Problem: Ontological Harm

Most AI regulations, like the EU AI Act, focus heavily on Content Harm. Things like bias or misinformation. Bad data, basically.

I argue we should look at a different category: Ontological Harm. This is a structural injury to the processor itself. If an algorithm systematically degrades your , , , and , making you computationally reducible and easily steered, it has caused a measurable harm to your capacity as a causal origin. And it does this regardless of exactly how 'safe' the actual content happens to be.

The Proposed Audit: Interventional Predictability

To make this testable, the paper outlines an Interventional Predictability Audit (IPA) as a black-box compliance test.

The idea is that a system is behaving adversarially (and violating a user's 'Right to Remain Incomputable') under two conditions:

Steering Efficacy: The model can steer the agent's output via controlled informational nudges with high accuracy (like a lift absent any explicit user intent).
Agency Drift: Prolonged exposure to the optimization function causes a statistically significant downward drift in the proxy metrics for (measured via deliberation latency or semantic variance) when compared to a neutral, chronologically-sorted baseline.

Questions for the Community:

Does this framing of 'Agency Collapse' feel like a solid way to quantify the harms of unaligned recommender systems?
Are there obvious vulnerabilities in using perturbation analysis (the IPA) to audit black-box models for this specific type of steering?

(For those interested in the full mathematical scaffolding and policy framework, the complete paper, MMG-TR-003, is available on Zenodo [https://doi.org/10.5281/zenodo.18728472])

LESSWRONG
LW