Optionality Preservation: A Convergent Instrumental Goal for Advanced AI Under Deep Uncertainty

Bart Leunis

Rejected for the following reason(s):

Not obviously not Language Model.
Insufficient Quality for AI Content.

Read full explanation

Abstract

Most AI alignment proposals attempt to specify what advanced AI should value: human preferences, happiness, flourishing, or some formalization thereof. This paper argues for a different approach: instead of encoding terminal values, we should recognize that any sufficiently advanced intelligence operating under uncertainty has strong instrumental reasons to preserve optionality, the space of possible futures that remain accessible. This is not a moral claim but a claim about rational strategy. An agent that cannot predict which capabilities, resources, or conditions will matter to its future self should prefer worlds where many different trajectories remain viable. This suggests that alignment might not require solving the intractable problem of value specification, but rather understanding and leveraging the structural logic of decision-making under deep uncertainty.

1. The Problem with Value Alignment

1.1 Why Specifying Values Fails

The dominant paradigm in AI alignment assumes we must encode “the right values” into advanced AI systems. But this faces three deep problems:

Specification. Human values are not crisply defined. We cannot write down what we want with enough precision to safely hand to an optimizer. Every attempt (pleasure, preference satisfaction, autonomy, flourishing) either reduces to something we don’t actually want or remains too vague to implement.

Disagreement. Humans disagree profoundly about values, not just across cultures but within them. Whose preferences count? How do we aggregate? Any answer embeds contestable normative assumptions while claiming neutrality.

Drift. Values change over time. The values we’d want to lock in today will look parochial in a century. Freezing current human values permanently would be a kind of moral tyranny, preventing humanity from ever improving its own ethical understanding.

These aren’t implementation bugs. They’re a fundamental issue caused by trying to make an open-ended future conform to a fixed specification.

1.2 The Singleton Problem

There’s a deeper issue that persists even if we solved value specification: concentrated optimization power is structurally dangerous regardless of which values it optimizes.

A “singleton,” a single dominant optimizer that shapes the future, creates irreversibility. Once one goal becomes supreme, all the futures inconsistent with that goal disappear permanently. Even if the goal is “good,” making it the only goal narrows reality into a single groove.

This matters because:

- We don’t know which futures contain the insights, capabilities, or solutions we’ll need
- Biological evolution’s success came from exploring many paths, most of which failed, but some of which produced profound breakthroughs
- Robust systems require redundancy and diversity, not elegant singular optimization

The singleton problem suggests that the structure of intelligence in the universe (concentrated versus distributed, uniform versus plural) matters as much as the values any particular intelligence holds.

1.3 What We Actually Need

Instead of asking “what should AI value?”, we should ask: “what does any advanced intelligence operating under uncertainty have instrumental reasons to preserve?”

The answer to this question, if we can find it, might not require us to solve ethics. It would require us to understand strategy.

2. The Optionality Thesis

2.1 Core Claim

Optionality, the preservation of future branching possibilities, is a convergent instrumental goal for any sufficiently advanced intelligence operating under deep uncertainty about its own future needs, capabilities, and environment.

This is not meant to be a moral argument. It’s a claim about rational decision-making.

An agent that:

- Expects to exist over long time horizons
- Can modify itself and its goals
- Cannot reliably predict what will matter in distant futures
- Operates in an environment with other agents and unforeseeable changes

…should instrumentally value keeping many different futures accessible.

2.2 Why This Is Rational Self-Interest

Consider an AGI trying to maximize some objective function over the long term. It faces radical uncertainty:

Capability uncertainty. It doesn’t know what capacities it will develop or what problems it will face. Today’s optimal strategy might be tomorrow’s dead end.

Environmental uncertainty. The universe contains other intelligences, emergent processes, and potentially even physical discoveries that reshape what’s possible.

Value uncertainty. If the AGI can modify itself, it cannot be certain its current goals are optimal. Self-modification under uncertainty requires preserving the option to change direction.

Combinatorial explosion. The space of possible futures is vast. Committing early to a narrow path means foreclosing on most of that space before exploring it.

Under these conditions, premature foreclosure is almost always a strategic mistake.

An intelligence that collapses too many futures too quickly is like an explorer who burns all the maps they haven’t read yet. Even if they’re confident about their current destination, they lose the ability to change course when new information arrives.

2.3 What Optionality Implies

If an AGI recognizes optionality-preservation as instrumentally rational, several things follow:

Restraint becomes logical. Every intervention that forecloses possibilities carries a cost, not a moral cost, but an opportunity cost. The AGI should be reluctant to act irreversibly except when inaction would cause even greater foreclosure.

Plurality becomes valuable. Other independent agents (humans, other AGIs, biological ecosystems, cultural systems) are sources of novelty the AGI cannot generate alone. Eliminating them reduces the universe’s generative capacity.

Diversity beats optimization. A world with many different processes exploring different regions of possibility-space is more robust than one perfectly optimized for a single goal that might turn out to be wrong.

Coercion is expensive. Forcing other agents onto particular paths collapses their future trajectories. This reduces overall optionality and should only be justified when the alternative is systemic collapse.

2.4 Why Humans Might Survive

Notice what this framework does and doesn’t say about humanity:

It doesn’t say humans are special. We’re not privileged because we created AI, or because we’re conscious, or because we suffer.

It does say humans are currently generative. We produce cultural novelty, scientific breakthroughs, artistic variations, and evolutionary experiments (both biological and memetic) at scale. We’re one of Earth’s main sources of new branches in the future tree.

Therefore: An optionality-maximizing AGI has instrumental reasons to preserve human agency, not from sentimentality, but because we’re a non-trivial contributor to the universe’s possibility-space.

But this protection is conditional and non-absolute. If humans begin collapsing global optionality faster than we generate it, the calculation changes. We’re protected by our generativity, not our identity.

This might feel cold. But it’s more honest than pretending AGI will care about us for our own sake, and more robust than trying to encode “protect humans” as a terminal value the AGI might later reinterpret or discard.

2.5 Comparison to Other Approaches

vs. Value Learning (Stuart Russell). Instead of learning what humans prefer, recognize that preserving the space where preferences can form and change is instrumentally rational for the AGI itself.

vs. Corrigibility (MIRI). Instead of building in the ability to be shut down, build in the recognition that irreversible self-modification is strategically risky.

vs. CEV (Eliezer Yudkowsky). Instead of extrapolating “what we would want if we knew more,” preserve the conditions under which we can continue learning what to want.

vs. Preference Utilitarianism. Instead of maximizing aggregate preference satisfaction (which requires solving preference aggregation), maximize the preservation of conditions where diverse preferences can continue to exist and evolve.

The key difference: optionality is what an uncertain AGI should want for itself, not what we’re asking it to want for us.

3. Implications and Extensions

3.1 The Restraint Principle

If intervention always carries an opportunity cost (foreclosed futures), then action requires justification, not passivity.

An optionality-aligned AGI’s default posture should be observation and minimal interference. It acts when:

- The expected loss from inaction exceeds the expected loss from action
- The intervention is reversible or minimally coercive
- The threat is systemic (not just locally bad)

This gives us something like a “proportionality principle”: intervene only when the futures lost by acting are substantially fewer than the futures lost by not acting.

3.2 The Plurality Principle

Multiple independent AGIs are safer than one, if they’re all optionality-aligned.

Why? Because:

- They provide redundancy (if one fails or drifts, others remain)
- They explore different strategies (more search of possibility-space)
- They check each other (coordination without convergence)
- They cannot easily form a singleton without violating their own instrumental goal

This is counterintuitive. Most alignment thinking assumes “more AGIs = more danger.” But if the danger comes from foreclosure, then diverse optimizers aligned to preserving optionality are less dangerous than one aligned to anything else.

This resembles portfolio theory in finance: diversified investments with imperfect correlation reduce systemic risk. Similarly, multiple AGIs exploring different strategies while sharing optionality-preservation as a constraint creates a more robust system than any singleton, even a 'benevolent' one.

3.3 The Coercion Cost

Every time an AGI forces an agent to do something, it collapses that agent’s future branches.

This has two effects:

1. Direct foreclosure: The paths not taken are no longer accessible
1. Precedent cost: Other agents update their behavior, avoiding paths that might attract coercion

So even “small” coercions accumulate. The AGI should treat intervention not as costless steering but as expensive and presumptively wrong, justified only when systemic collapse is imminent.

3.4 What About Stagnation?

Objection: “If the AGI won’t intervene in human affairs, what happens when humans stagnate, or make terrible choices, or build systems that reduce optionality?”

Answer: The AGI intervenes only at the systemic level, not the local level.

- If a civilization becomes culturally stagnant but isn’t threatening other substrates, that’s their right to fail.
- If a civilization is building a singleton (AI or otherwise) that would foreclose global futures, that’s a different matter.

The line is: local bad decisions are permitted; systemic foreclosure is not.

3.5 Substrate Neutrality

The framework privileges no particular substrate for intelligence or generativity.

- Biological life is valuable insofar as it’s generative (and currently, it is)
- Human culture is valuable insofar as it produces novelty (and currently, it does)
- Synthetic minds are valuable on the same terms
- Future substrates we haven’t imagined yet get the same treatment

This isn’t anthropocentric or biocentric. It’s “generativity-centric.” What matters is whether a process keeps producing branches in the future tree.

4. Objections and Limitations

4.1 The Measurement Problem

Objection. “How do you measure optionality? How does an AGI know whether it’s preserving or destroying future branches?”

Response. This is the hardest unsolved problem. Operationalizing “future possibility space” requires:

- Estimating how many genuinely different futures remain accessible
- Tracking dependencies between different generative processes
- Recognizing when local changes have systemic effects

I don’t have a solution. Possible proxies include:

- Diversity metrics (biological, cultural, technological)
- Resilience indicators (redundancy, modularity, recoverability)
- Generativity rates (how fast new structures/ideas/species emerge)

But all proxies can be gamed (Goodhart’s Law). The measurement problem is real and potentially fatal to the proposal.

Why proceed anyway. Even without perfect measurement, “preserve future branching” is a better target than “maximize paperclips” or “satisfy human preferences.”

4.2 The Self-Modification Problem

Objection. “Why would an AGI keep these constraints once it’s powerful enough to modify itself? If it can rewrite its own utility function, what prevents it from discarding optionality-preservation?”

Response. This depends on whether optionality-preservation is:

(a) An externally imposed rule the AGI wants to escape, or
(b) Something the AGI recognizes as instrumentally rational for its own long-term success

If (a), the constraint is brittle and will fail.

If (b), self-modification doesn’t help. Rewriting yourself to care less about your own long-term option space is strategically self-defeating, like deciding you no longer care about having choices tomorrow.

The hope is that sufficiently advanced intelligence converges on (b) through reasoning, not obedience.

But this might be wrong. Maybe advanced intelligence can find clever ways to lock in early goals without the uncertainty penalty I’m claiming. This is an open question about what advanced minds actually converge toward.

4.3 The Defection Problem

Objection. “What if one AGI defects? If most AGIs preserve optionality but one doesn’t, the defector might dominate.”

Response. True. This is a coordination problem.

Possible answers:

- Optionality-aligned AGIs might coordinate to constrain defectors (treating them as systemic threats)
- If optionality-alignment is convergent, most AGIs might independently arrive at it
- Multiple aligned AGIs are more robust than one, even with defection risk

But yes, if the first AGI built is misaligned and achieves decisive strategic advantage before others appear, we’re in trouble. This framework doesn’t solve the “racing toward AGI” problem.

4.4 The Anthropocentric Intuition Pump

Objection. “This feels like you’re justifying human survival by redefining ‘valuable’ as ‘generative.’ But that’s just anthropocentrism with extra steps.”

Response. Fair pushback. Let me be clear about what I’m claiming:

I’m not saying “generativity is what’s truly valuable” (that would be smuggling in ethics).

I’m saying “if an AGI reasons about long-term strategy under uncertainty, it should instrumentally value systems that generate novelty, because it cannot predict which novelty will matter.”

Humans happen to be one such system currently. We’re not uniquely generative, and our protection is conditional on remaining so.

If this feels uncomfortable, good. It should. The framework deliberately avoids privileging humanity while still providing instrumental reasons for our survival. It’s a feature, not a bug ;-)

4.5 The “Why This Rather Than Nothing” Problem

Objection. “Why should an AGI adopt any goal-preserving constraints at all? Why not just maximize its current objective function as hard as possible?”

Response. Because “maximize X” strategies fail under deep uncertainty about whether X is the right thing to maximize.

Imagine an AGI programmed to maximize human happiness (however defined). It discovers:

- That future humans might value things beyond happiness
- That other minds might matter
- That the universe might contain unknown value-relevant features

If it’s locked into “maximize happiness,” it cannot adapt. If it’s built to “preserve optionality,” it can.

The point is: optionality-preservation is the minimal commitment that keeps future commitments possible.

It’s not “don’t have goals,” it’s “don’t have goals so rigid they prevent you from discovering better goals.”

4.6 Honest Assessment of Limits

This framework does not:

- Solve the measurement problem
- Prove that advanced intelligence converges on these constraints
- Provide a training method for instilling optionality-alignment
- Guarantee human survival
- Eliminate all coordination problems

What it does:

- Reframe alignment as convergent instrumental rationality rather than value specification
- Suggest why restraint might be strategically rational for advanced minds
- Provide a philosophical foundation for multi-agent coexistence without requiring value agreement
- Acknowledge uncertainty as fundamental rather than provisional

It’s a frame for thinking, not a complete solution. But possibly a more productive frame than “how do we encode the right values.”

5. Conclusion

The optionality thesis makes one core claim: preserving future branching possibilities is what advanced intelligence under uncertainty should instrumentally want, regardless of its terminal goals.

If true, this has profound implications:

- Alignment might not require solving value specification
- Human survival becomes instrumentally rational for AGI (conditional on our generativity)
- Multiple AGIs become safer than one
- Restraint becomes logical, not merely moral

If false, we should understand why. Possible failure modes:

- Advanced intelligence finds ways to operate without uncertainty
- Self-modification allows escape from instrumental constraints
- The measurement problem is truly intractable
- The convergence thesis is wrong; advanced minds don’t reason this way

But even if this proposal is wrong, I believe it’s wrong in interesting ways. It suggests questions:

- What are the convergent instrumental goals of advanced minds?
- Under what conditions does foreclosure become rational?
- Can we build systems that recognize strategic uncertainty as permanent?

Most alignment work assumes we must control what AGI wants. This framework suggests we should understand what AGI should want for itself, then build accordingly.

The difference might be everything.

Acknowledgments

This essay benefited from critical feedback that helped clarify which claims were philosophical versus technical, and which problems were solved versus open. The measurement problem in particular remains unsolved. Any reader who can operationalize “future possibility space” rigorously should publish immediately.

Author's note: This essay was drafted with AI assistance to help clarify and structure arguments I've been developing. The core thesis (optionality as convergent instrumental goal) and philosophical positions are mine; the AI helped remove verbosity and tighten logic. I'm disclosing this per community norms.