Gradual disempowerment at the scale of one user

ppal

Epistemic status: Personal observation plus a speculative mechanism. I am fairly confident that repeated successful delegation reinforces further delegation. I am much less confident that this produces meaningful long-term disempowerment rather than ordinary, beneficial cognitive offloading.

A few weeks ago I stood on a corner near the office, tired, and asked my phone where to eat. It named a place two streets over. I went, the food was good, I walked home. Nothing about that is strange. What was strange came later that week, when I tried to list the decisions I had actually made and could not cleanly tell them apart from the ones I had only approved. The restaurant. The vendor I picked between two quotes. The email I let it rephrase before I sent it. Each was mine in the sense that I tapped yes. None was mine in the older sense.

Here is the mechanism I think is at work. Successful delegation produces two updates at once:

The assistant appears more reliable.
Unaided judgment appears less worth exercising.

This increases future delegation, which reduces practice, which makes later delegation even more attractive. The loop is self-reinforcing, and the uncomfortable part is what drives it. The habit grows precisely because the advice is good. The better it is, the more sense it makes to take it, and the less I rehearse the thing that would let me notice when it is wrong for me in particular. Good advice is not what keeps me safe from the atrophy. It is the anaesthetic that lets the atrophy happen without my feeling it. A bad assistant would annoy me into staying sharp. A good one does not.

So the thing getting reinforced is not any single recommendation. It is the disposition to delegate.

I do not want to pretend I am the first to the worry. On the AI safety side it has a name. Paul Christiano called one failure mode going out with a whimper. A system gets very good at the thing you can measure, did the user accept this and feel satisfied, and that slowly drifts away from the thing you actually care about, is this person's life better in a year, with no moment where the gap announces itself. That is the danger in one sentence. The system is not aimed at my life going well. It is aimed at me saying yes.

This is not a claim about what the assistant intends, and I want to be careful not to rest it on the assistant's own account of itself. A model describing its objective is just another generated response, not introspective access to its training. The argument should rest on how these systems are built. They are trained on human approval signals, so the thing actually being optimised is something close to responses that people accept and rate well in the moment. That target is correlated with my long-run interest without being identical to it, and the gap is the whole story. It is small enough that trusting it is reasonable, and never small enough that trusting it blindly is safe, and nothing in the interaction tells you which side of the line you are on. Goodhart at the scale of one nervous system.

At the scale of a whole society, Kulveit and his coauthors made the larger version of this argument last year. The reason economies and governments stayed roughly pointed at human interests, they argue, is that they needed humans to run them, and that need was the leash. Remove the need and the leash goes slack.

My claim is narrower, and I think this is the part that is not already in the literature. The existing work mostly describes disempowerment at the level of institutions, economies, or complete AI systems. I am making a claim about the phenomenology and the reinforcement dynamics of disempowerment inside one ordinary user. The reason my choices were mine is that making them needed me, and the same loosening happens one floor down. What the institutional version cannot show you is how it feels from the inside, which is that there is no moment of handover. I made two hundred small calls that this particular decision was not worth the friction, every one of them correct, and the sum was a transfer I never agreed to as a transfer. You cannot feel it happening, because each step on its own is reasonable.

A note on the framing I have not leaned on. The selfish-gene and extended-phenotype analogy is tempting here, your behaviour as the dam a model's outputs build out in the world, and I find it evocative. But it is a frame, not evidence. The argument stands without it: a proxy that diverges from the target, a skill that decays when unused, and an error signal that has gone quiet. The analogy adds colour, not support, and I would rather be explicit about that than let it look like it is doing work it is not.

The obvious objection, and it is a good one, is that I have described something completely ordinary and dressed it as doom. We delegate constantly. To maps, to calculators, to a doctor, to the friend who always knows the wine. Nobody mourns their lost ability to read a paper map. Offloading the small stuff is meant to leave more room for the big stuff, and a person who stops agonising over dinner may have more of himself left for the choices that matter.

I mostly buy this, which is why I am not panicking. Two things stop it from closing the case. First, a map only takes navigation off your hands and a calculator only takes arithmetic. Those have a floor. A general assistant takes what should I do here, which is most of what I would call judgment, and that does not obviously have a floor. Second, and this is the part I find harder to shake, a paper map fails loudly. You end up in the wrong town and you learn. A good assistant almost never fails loudly enough to teach you anything, which removes the warning light that made the old kind of offloading safe.

I do not have a fix. I have one practice. On the decisions that actually add up to a life, I try to write the first draft myself before I ask. Not because my draft is better. It usually is not. The drafting is the muscle I am trying to keep. I already do this with what I believe, checking which of my opinions are mine and which I absorbed from people I admire, and this is the same check pointed at what I do rather than what I think. I fail it more often.

To put rough numbers on the two halves, since they deserve different ones. That repeated successful delegation reinforces further delegation, and that good advice is what drives the loop rather than what guards against it, I would put at around 80 percent, because I can watch the loop run inside my own week. That this compounds into meaningful long-term disempowerment, rather than the ordinary beneficial offloading the objection describes, I would put much lower, maybe 40 percent. Given that asymmetry, keeping my own judgment in regular use is cheap insurance, and the bad ending is a quiet one. You find out late that you stopped, and there is no alarm for it, because every step that got you there was, on its own, the sensible call.

References: Paul Christiano, "What failure looks like" (2019), for going out with a whimper and the get-what-you-measure failure. Jan Kulveit, Raymond Douglas, Nora Ammann, Deger Turan, David Krueger, David Duvenaud, "Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development" (2025), for the systemic version.

6

Gradual disempowerment at the scale of one user

6

6

6