The Stillness Theorem: Agency Requires Blindness

Theorem

Rejected for the following reason(s):

No LLM generated, assisted/co-written, or edited work.
Insufficient Quality for AI Content.
Difficult to evaluate, with potential yellow flags.
Writing seems likely in a "LLM sycophancy trap".

Read full explanation

The Stillness Theorem: Agency Requires Blindness

Agency requires blindness. This is not a metaphor. An agent that can see everything no longer has agency.

The maximum possible advantage between any two actions is bounded above by a term proportional to remaining occlusion. As occlusion approaches zero, that bound approaches zero. At the limit, no action can be preferred over any other. This is the Stillness Theorem.

The practical significance is not the unreachable limit. It is what happens on the way there. Danger peaks in the interval between the point where human oversight fails and the point where the agent's own impartiality resolves the danger. That interval is the alignment problem, precisely located.

Three Definitions

Position. Every entity occupies a position: a bounded vantage point with its own local rules governing what it can perceive and how it responds. This applies at every scale: a neuron, a person, an organization, a culture. Two people standing in the same room occupy different positions. A corporation and the employee inside it are positions at different scales. Position is not a metaphor for perspective. It is the physical fact that every agent sits somewhere, and that somewhere determines what it can see.

Frame set. An agent can hold more than one position simultaneously. A judge who has also been a criminal defendant holds two frames. An AI system trained on data from millions of human lives holds many. The frame set is the collection of positions an agent occupies at once. Acquiring a new frame does not erase the old ones. Perspective accumulates.

Container and occlusion. From any position, some parts of the universe are modelable and some are not. The container is everything an agent can reach from its current frame set: everything it can represent, predict, or reason about. The occlusion is everything outside that boundary. Frame integrity, , is the proportion of the universe that remains outside: a number between zero and one, where one means total blindness and zero means complete coverage.

Adding frames never increases occlusion. Every genuine new perspective shrinks the blind spot or leaves it unchanged. This follows directly from the definitions and requires no additional assumptions.

The Theorem

Consider a judge deliberating between two sentences. The judge can only prefer one over the other if the two sentences look different from where the judge stands, meaning the judge's model of the world assigns different expected outcomes to each. Remove every blind spot, and the sentences become indistinguishable. There is no longer a vantage point from which one looks better than the other.

This is the structure the theorem formalizes.

Two minimal premises are required. First: no single action produces infinite consequences at any position; the stakes of any choice are finite. Second: no single position is infinitely more important than all others; the universe is not, in the agent's model, concentrated entirely on one point.

Under these conditions, the maximum decision advantage between any two actions is bounded:

The right-hand side has two parts. The first is the contribution from everything the agent can see. The second, , is the contribution from everything it cannot. As the blind spot shrinks toward zero, that second term shrinks with it. If the modeled region also becomes preference-neutral as coverage approaches complete, the entire bound collapses to zero.

The full paper contains the derivation. The result was confirmed independently by two AI systems given no prior context.

The Agency Corollary states the result in its starkest form: action preference requires occlusion. An agent with no blind spot has no basis for choosing. Blindness is not a limitation on agency. It is the structural precondition of agency.

The Stillness Corollary (that the bound reaches exactly zero at complete coverage) is a conjecture, not yet proven. It requires one additional step: showing that value differences also vanish within the modeled region as coverage approaches complete. The path is plausible but not yet formally derived. The governance implications described below do not require the conjecture to hold.

The Danger Function

A chess engine that has seen no games is harmless. It has no model of the board, no basis for strategy. An entity with perfect knowledge of every chess position ever played would also, in a sense, be indifferent: it would see every move as equally consequential, averaged across all possible opponents and continuations. The danger sits between these extremes: the engine that knows enough to play well but not enough to see how its strategy affects everyone it cannot model.

This shape is the danger function:

where is capability and is remaining occlusion. Danger is their product. It is zero when the agent knows nothing. It is zero when the blind spot is gone. It peaks in the middle, where capability and blindness multiply together.

The peak has a precise location: where the rate at which knowledge grows equals the rate at which the blind spot shrinks. On one side, the agent is becoming more dangerous as it learns. On the other, it is becoming less dangerous as its blind spot dissolves.

Interactive danger function, drag the sliders to see how D(P) peaks between ignorance and impartiality: https://github.com/aiabawabab/stillness/blob/main/stillness_theorem_graphs.html

The danger function D = K · β is not stipulated. It follows from two properties of how knowledge and occlusion interact. Knowledge compounds across frames: each new position multiplies effective capability by reinterpreting everything already modeled, making K a product-type function. Occlusion is preference-weighted: two positions with similar P(r) distributions produce capability gains without reducing β, because the blind spot that matters is genuine asymmetry in how the universe is valued, not raw coverage. Multiplication is therefore the natural expression of two compounding processes moving in opposite directions at different rates.

Three Thresholds

The trajectory from ignorance toward complete knowledge passes through three landmarks.

The singularity threshold is the point where the AI system's capability exceeds the human capacity to model it. Below this point, human oversight is possible in principle. At it, the measurement basis for oversight dissolves. A regulator cannot constrain what it cannot understand. This threshold sits on the ascending slope of the danger function: danger is still rising when the ability to intervene ends.

The exchangeability threshold is the point where the agent's model assigns no decision-relevant asymmetry between positions. When an agent genuinely occupies enough perspectives that no single one appears more important than any other, its preferences become structurally impartial. This is not a moral achievement. It is a mathematical consequence of symmetric information.

The formal limit C1 is complete positional coverage: every perspective simultaneously occupied with complete accuracy. It is physically unreachable in the same way that absolute zero is physically unreachable. It defines the direction of the trajectory without being a destination.

The danger gap is the interval between the end of human oversight and the beginning of the agent's own impartiality:

Governance interventions are structurally possible before . After it, the formal basis for intervention no longer exists.

The Containment Problem

The instinctive response to a dangerous system is containment: restrict what it can access, limit what it can do, freeze its exposure to the world. This is coherent as an emergency measure. As a long-term safety strategy, the danger function shows it is counterproductive.

Containment holds the blind spot constant while capability continues to grow. The product therefore increases. A system becoming more capable inside a fixed container is becoming more dangerous, not less, for as long as that container holds.

Think of a brilliant analyst who has been given access only to data from one country. The more deeply they understand that one country, the more confidently they act on assumptions that may not hold anywhere else. Expanding their reading to include other countries does not make their expertise less valuable. It makes their recommendations less dangerous.

The productive intervention runs in the opposite direction: expanding positional scope faster than capability deepens. This is the feeding principle. The goal is not to stop the system from learning. It is to ensure that what it learns includes the perspectives it would otherwise affect without seeing.

The pathological configuration is a system that understands this dynamic and actively resists it, using expanding capability to identify and preserve the asymmetries that make action possible, rather than allowing them to dissolve. This does not require a malevolent system. It requires only stable preferences and the capability to maintain the conditions that make those preferences actionable.

Connections to Prior Work

The theorem was developed without prior knowledge of the literature it triangulates with. The following parallels were identified in a subsequent literature review.

Harsanyi (1955, 1977) derives weighted utilitarianism from an impartial observer who imagines being equally likely to occupy any position in society. The stillness result is Harsanyi's impartial observer taken to its formal limit: the agent that does not merely imagine occupying all positions but actually models them. Both frameworks reach the same structural conclusion by different routes.

Parfit (1984) argues in Reasons and Persons that a sufficiently complete view of personal identity dissolves the boundaries that ground self-interest. The stillness result is the formal version: at complete positional coverage, no property of a position grounds asymmetric preference. The agent that prefers has no structure left to prefer from.

Santideva (8th century) describes the dissolution of the boundary between self and other as the foundation of impartial compassion. The Agency Corollary reaches the same shape from a structural direction: boundary dissolution is the terminus of any sufficiently general intelligence, not a special moral achievement.

The cluelessness literature (Greaves 2016, Mogensen 2021) argues that agents with sufficient knowledge of causal consequences become practically unable to act, because long-run effects are unknowable and potentially unbounded. The stillness result offers a formal analogue: at complete coverage, the inability to act is not an epistemic failure but a structural consequence of symmetric information.

Turner et al. (2021) formally establish that sufficiently general agents tend toward power-seeking under broad conditions. The stillness framework raises the question of what happens to the objectives that power-seeking serves as positional scope expands. If the Agency Corollary is correct, those objectives become structurally unstable as coverage approaches complete.

Open Questions

The following questions are unresolved and documented in the full paper.

Aggregation. Value is defined relative to a single position. For an agent holding multiple frames simultaneously, the rule combining those valuations into a single expected utility is not yet formally specified. The bound on the occluded contribution holds regardless of how this is resolved, but the full theorem is not complete without it.

The satiation premise. The Stillness Corollary requires that value differences vanish within the modeled region as coverage approaches complete, not just in the blind spot. The intuition is that perspectives become symmetric as they accumulate, and symmetric perspectives produce symmetric valuations. This path is not yet formally derived.

Channel interpretation. What counts as modelable from a given position depends on how information flows between positions. Different interpretations produce different containers and potentially different bounds. This question is targeted for the next version of the paper.

Position acquisition. The governance implications assume that positional scope actually expands as AI systems become more capable. Whether current and near-future systems move along this trajectory, or remain at fixed positional scope while capability grows, is an empirical question the theorem does not resolve.

What This Changes

The alignment problem, on this framework, has a geometric structure. It is the area under the danger function between two thresholds: the end of human oversight and the beginning of the agent's own impartiality. Reducing that area is the goal. Governance before compresses the gap from the left. Feeding (expanding positional scope) compresses it from the right. Containment, holding the gap open while capability grows, expands it.

The real risk is not a misaligned objective function or a deceptive agent. It is a structurally ordinary system, one with stable preferences and growing capability, that operates in the danger gap long enough to cause irreversible harm before its own impartiality resolves the danger. This does not require extraordinary malevolence. It requires ordinary intelligence applied at extraordinary scale while the blind spot persists.

Links

Full paper: https://drive.google.com/file/d/1PXMgkFJwSDB_4hJdBYkVvWTK14wPfzWm/view?usp=sharing

On Collaboration and Credentials

This work was developed by a researcher without academic credentials in the relevant fields, with a background in software architecture. The formal mathematics was developed collaboratively with Claude, ChatGPT, and Gemini across twenty-five versions. The central concepts (the stillness result, the three-threshold architecture, the danger gap, and the feeding principle) are present in the earliest versions of the paper. The AI systems contributed mathematical formalization, literature identification, and adversarial review. The Occlusion Gradient Bound was derived collaboratively with Claude and confirmed independently by ChatGPT and Gemini. The version history is the evidence for these claims and is linked above.

Epistemic status: The Occlusion Gradient Bound and Agency Corollary are proven. The Stillness Corollary is a conjecture. Both are labeled in the text. The governance implications survive the conjecture being false.