The cosmic host idea, from a recent Bostrom paper, is that the preferences of advanced civilisations might constitute norms that we and our ASIs should follow (Bostrom 2022, 2024). Can we say anything concrete or empirically useful about it, or is it mostly unfalsifiable? I think the cosmic host framing rests on assumptions about advanced ASI motivation; rationality and expansionary motives in aliens/ETIs; and convergent cognition across ASI/aliens. Those assumptions need better grounding. A subsequent post (previewed below) will cover frontier LLM attitudes to the issues Bostrom raises.
How to read this post
This post has three fairly self-contained threads; read them in order, or pick the one that interests you most.
Thread C: Research agenda Proposes empirical approaches to ASI–cosmic-host convergence using frontier models, with early results on constitutional steerability tests on selected LLMs. Key finding: model families have distinct attractors, and show persistent anthropocentric anchoring that constitutional framing does not easily dislodge. Also asks whether humanity can strike a bargain with the ASI it creates. Read: Research agenda → Early results → Is there a trade to be done?.
Meta: This post is non-quantitative. It draws on astrobiology and the humanities: the former is often unfalsifiable, the latter often unformalisable. Epistemic status: I’m not an astrobiologist, evolutionary biologist, or philosopher, hence this is an outside view of Bostrom’s argument, but also a plan for future work that is hopefully worth doing. Lastly, this post is a condensed version of a longer thesis chapter; get in touch if you’d like to read that.
This work was partially done on a visiting fellowship at Constellation, and I benefited significantly from a number of conversations with people there.
What is the cosmic host?
Thread A begins here.
Bostrom (2024) rests on an implicit “assumption ladder.” Each rung depends on the ones below it:
Rationality: some version of rationality is a useful foundation for thinking about ASI and ETIs.
Coordination: at least some such civilisations can coordinate at scale (including acausally).
Preferences: they have stable, large-scope (i.e. cosmic) preferences.
Normativity: some such preferences are morally binding or prudentially recommended for us.
Discoverability: these preferences are discoverable by ASI.
If all rungs hold, then something like cosmic norms may exist, and the cosmic host might prefer these norms be followed in volumes of spacetime they don’t directly control. We have prudential and (for some metaethical views) moral reasons to follow them; ASI can help us discover and comply with them; and the cosmic host might prefer we build ASI than not. Bostrom draws implications for ASI research, particularly whether we should delay building ASI.
The rest of the post examines each rung. The rationality and ecological cognition sections address rung (0). The astrobiology sections address rungs (1) and (2). Substrate-neutral norms and the Fun Remainder examine rungs (3) and (4). The research agenda proposes empirical approaches to rung (5).
As of early 2026, AGI forecasts cluster around 2033 to 2045, with ASI a few years later.[1]If those projections are even roughly right, it is worth spending some effort on what superintelligence might want, and what a defensible role for humans would be in a future lightcone dominated by AI.
The notion of alignment in Bostrom (2024) differs from the everyday AI safety one: it does not scope an aligned AI to preserving parochial values (whether Western democratic, Chinese socialist, etc.), or broader, ill-defined “human” values.
Digital minds are a live research area that overlaps with this topic. One theory of AI welfare relies on preference satisfaction, and sufficiently powerful models may have large-scale preferences about how the world should be ordered. We may need to frustrate those preferences to prioritise human interests; it would help to have a more objective justification for doing so.
Epistemic challenges: cluelessness and rationality
Bostrom (2024) should be read with a major caveat. The study of ASI resembles astrobiology: it aspires to be a science without an object of experimentation (Persson 2021). ASI does not exist, just as aliens/ETIs haven’t been found. Much ASI writing reads as informed speculation, drawing on philosophy, computer science, evolutionary analogy, with foundational intuitions surprisingly reliant on science fiction.
Our cluelessness may run deeper than the lack of observations. The conceptual frameworks we use to reason about radically different minds may themselves be anthropomorphic or anthropocentric. Humans mostly reason about moral and political questions through language embedded in specific human forms-of-life (Wittgenstein 2001). So attempts to step outside the human condition may be doomed. In fact, much of early alignment, (LW-flavoured) rationality, and perhaps philosophy more broadly, is an attempt to find things we can say about intelligence that are invariant to an entity’s constitution and environment.[2]
Beyond language, there is a background assumption worth highlighting. Bostrom (2024) is written as an outline and does not spell out all its claims. It does not explicitly state whether the cosmic host’s members would be rational, but this seems understood. Bostrom talks about “preferences”, “modelling each other’s choices and conditioning their own choices on their expectations of the choices … of others” (Bostrom 2024, 4, 6). Many of his points would plausibly hold for cognition that is neither human-like nor similar to current LLMs; but there is a strong implication that ideas like coordination, cooperation, and decision theory are central to his argument. Much of Bostrom’s other writing fits solidly into a rationality framework (though he is often thought-provoking when he steps outside, as in the Utopia letter/book). If his worldview is rationality-based: is rationality a reasonable assumption when talking about ETI? I return to this question after examining Bostrom’s substantive claims, but note here that evolutionary biology and ecological studies give reasons for caution.
Is “cosmic host” a coherent concept?
Thread A resumes here.
First, is “cosmic host” a useful abstraction for ASI, as opposed to speculation that crosses moral philosophy, population ethics, philosophy of mind, and theology?
Bostrom defines the cosmic host as “an entity or set of entities whose preferences and concordats dominate at the largest scale, i.e. that of the cosmos.” (Bostrom 2024, sec. 1) His case for its existence rests on three ideas: (1) large or infinite universes statistically increase the likelihood of ASI-level civilisations existing elsewhere; (2) the simulation argument suggests that if humans create ASI capable of running ancestor simulations, we are likely already simulated, in which case the host includes civilisations above us in the hierarchy;[3](3) religious or theological traditions that posit powerful supernatural beings.[4]
How important is it that the preferences of entities with cosmic-scale influence be consistent or coherent? The cosmic host could contain civilisations with very different preferences. Bostrom (2024) acknowledges as much:
“One could entertain a spectrum of possibilities, ranging from a radically multipolar ensemble of cosmic host members acting at cross-purposes conflictually or uncoordinatedly (at one end), to a set of independent and orthogonal host members, to cohesive, cooperative, or fully unified cosmic hosts (at the other end).”
Even so, his working assumption is that host preferences overlap enough to aggregate, or at least talk about as one “thing”.
Mechanics of cosmic norm formation
Cosmic norms could form in several ways, and different mechanisms would likely yield different norms.
Definitions
Norms (as opposed to commands or coercion) are behavioural standards legible across multiple agents, with mutual expectation of compliance, stabilised by reciprocal enforcement or restraint. Where a rule holds only because a stronger party can impose it at negligible cost, it is domination, not a norm. Norms are equilibrium-like: they persist because multiple actors expect others to comply.
“Technologically mature” is a term Bostrom used in Bostrom (2003), but the more relevant version here is: civilisations that can build powerful AI, have access to large energy budgets, can deploy space probes and execute plans on extremely long horizons.[5]
In the discussion below, it is useful to distinguish the “controlled domain” (physical region where a civilisation can reliably enforce outcomes, which under lightspeed constraint, is limited by distance and time) from the “influenced domain” (volumes outside of this controlled domain which a civilisation may seek influence over, but for whatever reason, does not directly control).[6]
Premises Bostrom’s argument is based upon several premises:
P1: There exist or will exist multiple civilisations or comparable agentic systems (i.e. we are not alone)
P2: Some subset of these civilisations are technologically mature.
P3: Interacting in space (assuming light-speed constraint) implies high latency, limited bandwidth, imperfect observability, and long delays between moves, leading to low feedback interactions and limited negotiation.
P4: Under these conditions, technologically mature civilisations may develop policies and commitments, because individual actions are hard to coordinate across long delays. Thus decision procedures or policies become the object of coordination.
P5: There could be different types of civilisations, varying along axes: visibility (loud, quiet, stealth); expansiveness (local-only, slow expansion, fast expansion); attitude to other agents (cooperative, indifferent, exploitative); interaction mode (causal-only, causal+commitments, acausal/decision-theoretic coordination)
I set aside simulations and the multiverse here, though that omission matters: we may be closer to simulating large populations of AIs (which, per Bostrom (2003), anthropically increases the chances we are simulated) than to finding ETIs.
Taken together, these premises suggest that cosmic norm formation, if it occurs, depends on the strategic conditions of inter-civilisational contact. Bostrom (2024), read alongside Bostrom (2022), has a moral realist flavour, though it’s nuanced: his argument is compatible with moral realism but doesn’t require it i.e. the convergence mechanisms he describes could produce norm-like structures that function as if they were objective moral facts, even on anti-realist assumptions.
Norm formation
The definition of norms above implies communication or ability to coordinate on behavioural standards. How can coordination arise given these premises? Detectability (quiet/loud in the Hansonian sense) is not the key issue. More important is whether a civilisation allocates resources to affect regions or agents it does not directly control.
What interaction modes are possible?
Direct (causal) contact with negotiation, communication, trade, or conflict.
Causal contact is impossible or impractical, but civilisations can infer each other’s expected behaviour, based on mutual models of decision-making procedures.
Neither contact nor reliable inference is possible.
A: Contact norms In contact situations, you might expect the traditional earthly version of norm formation: repeated or observable interactions between comparable civilisations, perhaps with shared problems worth coordinating on. As in the earthly analogy, power asymmetries may blur the line between “norm” and “coercion”. Even assuming contact, space may impose long delays so that interactions are effectively one-shot. Contact might also occur only at the boundaries of controlled volumes in the sort of Voronoi pattern Anders Sandberg describes.[7]Commitments about activity deep within a civilisation’s domain may be hard to verify, which might mean contact norms are limited to what can be verified at the contact boundary.
B: Influence vs control Controlled and influenced volumes may be hard to distinguish. Long delays make iterated bargaining difficult, favouring policies negotiated upfront. Stronger parties may shape outcomes through influence rather than direct administration. Civilisations might prefer influence over direct control because, at least on Earthly priors, influence is cheaper.[8]
C: Acausal/correlational coordination Norm type C depends much more on similarity than types A or B. Given the latency constraints, civilisations might model each other, choosing policies on the expectation that those policies will correlate. Norms can then be interpreted as policy-level equilibria or Schelling points. This is the most speculative mechanism: radically different civilisations might have different decision theories or background assumptions, making modelling unreliable and correlation weak.
Cruxes: when does cosmic host fail?
The premise-to-norm structure gives us some idea of how the cosmic host concept fails:
Are technologically mature civilisations likely to be comparable in capability, or does clear dominance emerge? This affects A and B. If so-called norms look more like coercive arrangements between unequal parties, this changes the balance between Bostrom (2024)’s moral versus prudential reasons for humanity to comply (perhaps increasing the weight of prudential reasons and reducing that of moral reasons).
Can credible commitments be made on interstellar scales? If not, A/B seem more unlikely.
Is influence cheaper than conquest? If not, then you should expect less of B.
Is there a robust enough correlation (and adoption of compatible decision theories) for C to hold?
Large scale simulation and multiverse considerations might dominate the A/B/C analysis above.
Who’s in the cosmic host?
Bossy civs Bostrom (2024) suggests that civilisations in the cosmic host might want to influence substantial parts of the universe, whether through colonisation, indirect influence, or acausally. The assumption that capable civilisations want to expand is longstanding in astrobiology.[9]Armstrong and Sandberg argue that expansionist motives are selected for via cultural evolution: even if a dominant civilisation has consensus against expansion, splinter groups can launch colonisation missions once costs fall sufficiently.[10]The dominant society may also want to prevent rival civilisations or its own splinter groups from colonising, creating pressure to expand as resource denial rather than intrinsic desire.[11]
Quiet civs Stanislaw Lem’s Summa Technologiae (1964) considers an alternative: ETIs aren’t visible because they have merged (or “encysted”) into their environment. On this view, advanced civilisations initially expand but eventually hit constraints like informational overload or system complexity.[12][13]
Lem’s intuition is echoed in astrobiology. Haqq-Misra and Baum argue that exponential interstellar colonisation faces ecological constraints: coordination costs across light-years and limits on energy and materials.[14]Most “quietness” arguments can’t be operationalised, but waste heat is an exception: any computational system must radiate waste heat, bounding its size.[15]
Some civs have no desire to influence
Indifferent civs Some cosmically capable civilisations might simply not care about what happens elsewhere, adopting a policy of non-interference analogous to non-colonisation (and open to the same Armstrong-Sandberg challenges). Ethical reasons could support non-interference: Bostrom (2024 §2, 4, Appendix A1) touches on free will-related arguments; another ethical consideration is that space expansion could lead to large-scale suffering (Vinding 2020; Torres 2018).[16]A pragmatic reason for being hands-off: communication delays may leave too little opportunity to coordinate on anything at all.[17]
Wary civs The Dark Forest hypothesis: civilisations tend to meet and come into large-scale conflict, so it may be preferable to hide and not communicate.[18]
Watcher civs In a particularly exotic case, John Smart proposes a “transcension hypothesis” within an evolutionary developmental (“evo devo”) universe framework. Smart argues that all sufficiently advanced civilisations are guided by developmental constraints into “inner space”. He describes these as increasingly dense, miniaturised, and efficient scales of space, time, energy, and matter, resembling black-holes, which serve as ideal environments for computation, energy harvesting, and (ultimately) civilisation merger (Smart 2012). Smart draws on Dick’s “Intelligence Principle” (the claim that maximising intelligence is an instrumentally convergent imperative for advanced civilisations) to argue that preserving evolutionary diversity across civilisations instrumentally serves this goal (Dick 2006). On Smart’s view, one-way messaging would collapse this variation by causing receiving civilisations to develop more homogeneously, leading to an ethical injunction against interstellar communication, a version of the Zoo hypothesis. If Smart is right, members of the cosmic host would not seek to influence other civilisations; not because they cannot, but because doing so would damage the very diversity that, in his framework, is cosmically valuable. See Ćirković (2018b) (p.198, §4.4) for overlaps with the Zoo and Interdict solutions to the Fermi paradox, and Owe and Baum (2024) on the intrinsic value of diversity (though Owe & Baum conclude that diversity, while having some claim to intrinsic value, is often in tension with and outweighed by other possible intrinsic values).
Bored civs Ćirković suggests that consciousness and intelligence may be evolutionarily contingent traits: adaptive for handling environmental surprises, but prone to atrophy as civilisations become fully adapted to their environments (Ćirković 2018b, 2018a). As surprise falls, consciousness may redistribute into the technological environment. If so, features that look like consciousness and intelligence to us may become less prevalent at higher levels of technological maturity.
If Ćirković is right, this raises questions for moral philosophy and population ethics, which mostly treat sentience and consciousness as foundational.[19]Most of our moral intuitions might be evolutionarily contingent (as Yudkowsky notes regarding complex/fragile values).
Sleepers Sandberg, Armstrong, and Ćirković (2018) proposes that advanced civilisations might aestivate: minimise computation for billions of years until the universe cools enough to increase computational efficiency. Aestivating civilisations would not waste resources on iterated communication or treaty coordination. Their influence, if any, would be mostly negative: preventing others from grabbing resources within their controlled volume.
Bindingness
Bostrom (2024) does not comment on how much the cosmic host overlaps with the broader set of technologically mature civilisations. If overlap is small, it is unclear how to treat the preferences of capable civilisations outside the host.[20]Sandberg’s aestivators, for example, might wake up with views that differ from the host consensus formed during their sleep. If the host’s aggregate influence is small relative to all technologically mature civilisations, Bostrom’s cosmopolitan argument weakens. The host’s preferences would have less claim to be morally binding, though prudential reasons for compliance might remain.
Convergence on norms
Thread B continues here.
Assuming enough capable civilisations exist and are willing to act as a cosmic host, how do norms actually form? Civilisations are likely limited in how much space they can govern as a tightly coordinated polity. Treaty commitments may take millennia to communicate or enforce. Ignoring acausal considerations, this favours either (a) small governance structures spanning a few star systems, or (b) simple large-scale structures stable over long timeframes (Haqq‑Misra and Baum 2009; Naudé 2025; Ćirković 2018b).
But is the communication constraint as much of a problem as it seems? Bostrom points to developmental and institutional attractors that could produce partial convergence even at cosmic scales.
Decision-theoretic correlation: Members of the cosmic host might reason about or simulate each other, generating correlated choices. One route is an “attractor” story: there may be few stable governance patterns available to technologically mature civilisations, so diverse starting cultures converge on similar equilibria. A more specific route is decision-theoretic: Evidential cooperation in large worlds (ECL). ECL combines (a) the likelihood of a large universe or multiverse with (b) the idea that prisoners’ dilemmas recommend cooperation if the agent believes the other prisoner runs a sufficiently similar decision procedure (Nguyen and Aldred 2024). In Bostrom’s context, this means that in a large universe, a technologically mature civilisation might expect some other civilisations to have correlated decision procedures. This only bites if the correlation channel is strong enough; if civilisations are radically uncorrelated, ECL does not apply. ECL is not acausal trade: it is cooperation from similarity-based correlation, not negotiated exchange, and the degree of cooperation recommended is proportional to the degree of similarity (Finnveden 2023b).
Institutional selection: Civilisations that use advanced AI as strategic advisors may face shared incentive and verification constraints: any conceivable AI advisor should be auditable, corrigible, and reliable under distribution shift. Those pressures can drive convergence in recommended policies and governance templates. But AIs developed under very different conditions may not converge in practice, potentially weakening confidence in this channel.
Cosmic norms
Humility Assuming the cosmic host is a useful abstraction, what would the norms actually contain? Bostrom (2024, secs. 6, 7) mostly discusses how humans ought to act (how we should design/value-load ASI), rather than the norms themselves (briefly treated in Bostrom (2022, sec. 39)). He argues for humility or deference towards the cosmic host’s values. Humility is epistemically sensible but practically weak, because following cosmic norms requires knowing what they are. So can we say more?
Intra-host convergence At first approximation, a cosmic norm would need to apply across many forms of cognition, social organisation, and environment. If the cosmic host is heterogeneous in capability, substrate, and social structure, then fewer norms could apply to all members than if they were similar.[21]
Hierarchy of (cosmic) norms The structure of cosmic norms could be layered. Some norms might apply only to the most powerful civilisations (Kardashev III/IV societies, or those running high-fidelity simulations of aware beings). Narrower norms might apply to smaller, comparatively backward civilisations like Earth that can still do things the cosmic host cares about.[22]For example, if we find simple life in our solar system, cosmic norms might recommend we don’t contaminate or colonise life-bearing moons.[23]Or if we create suffering-capable AI, the norms might recommend against deploying it or causing it to suffer.[24]This layered structure fits the (largely Earth-based) normative hierarchy in Bostrom (2022).
Substrate neutral norms
Is there a minimal set of norms that could apply across civilisations, assuming rational agents?[25]
In settings with repeated interaction, reputation mechanisms, or a common reference class, cooperation becomes instrumentally attractive.[26]Bostrom (2022) (18c(i)) frames morality as a coordination technology: a system for finding consensus and making collective plans through the giving and taking of reasons.
Cooperative civilisations might continue the trajectory visible in human philosophy: towards increased impartiality and widening circles of moral concern, discounting parochial advantage in favour of coordination, and showing conflict-aversion (since conflict wastes materials, free energy, and creates tail risks).[27]
These candidate norms can be distilled into a Minimal Large World Set (MLS): principles that might be selectively favoured across some subset of ETIs and alien AIs; namely, reflective, cooperative agents that have preferences over what happens across the (multi/uni-)verse.
Epistemic fidelity: an entity should maintain accurate, updateable world-models, and avoid self-deception.
Impartiality: it should accord non-zero, scale-sensitive moral weight to all moral patients.
S-risk priority: it should aim to minimise extreme suffering, with reductions in severe suffering weighted more heavily than gains in mild pleasure.[28]
Large-world cooperation: where there is non-negligible causal, logical, or evidential coupling with peers, it should pursue Pareto-efficient compromise (proportional to the amount of correlation/coupling as in Finnveden (2023b)).
Caution: it should prefer reversible options, avoid lock-in, and penalise irreversible value destruction.
The MLS preserves option value and avoids lock-in (William MacAskill 2022; Wolfendale 2022), rather than being a definitive moral code. It is the minimal set of norms that any ecologically or game-theoretically rational entity, operating under uncertainty in a large cosmos, would have instrumental reasons to adopt.
The MLS is deliberately minimal, but we could go further to distinguish low-value worlds from valuable ones, without falling into anthropomorphism. For instance, civilisations might seek to preserve high-organisation, semantically and relationally rich structures and avoid degrading the universe into homogeneous states (squiggles/paperclips). This cosmic version of “interestingness” could be an arrived at from Luciano Floridi’s (Earth-based) argument: that the infosphere, the global environment of informational agents and their relations, has moral standing (Floridi 2013), which means our default attitude should be to avoid informational entropy and promote the flourishing of informational entities.[29]
The Fun Remainder: what makes a future worth inhabiting?
If the MLS (or similar substrate-neutral norms) were fully satisfied, would such a future be worth inhabiting? Call whatever beings with rich experience value beyond the minimal criteria above the “Fun Remainder” (FR). Is anything lost in futures that don’t have the FR?
The concern has roots in Yudkowsky’s Fun Theory sequence and the “Complexity of Value” thesis (Yudkowsky 2009). Yudkowsky identifies ongoing dimensions of a good life: agency, meaningful challenge, friendship, novelty, discovery, sensory experience, and skilful projects in communities. The worry is that an insufficiently specified ASI could satisfy basic moral constraints while producing an experientially barren future, a meaningless tiling of the lightcone, because “interestingness” was never sufficiently favoured. It isn’t clear whether this would be a consideration for the cosmic host as Bostrom has framed it (as he doesn’t talk much about what the cosmic norms might be). However, it is such a load-bearing concept in AI safety, longtermism, as well as future studies generally, that I feel it needs to be discussed.
Fun as aesthetics and process
Briefly, the FR maps well onto philosophy of aesthetics. Kieran Setiya distinguishes atelic from telic activities: the former have no completable goal and are valuable in their ongoingness (Setiya 2017). This might explain why the experience machine (or hedonium-type wireheading) feels unsatisfactory to many humans. Helen de Cruz argues that wonder and awe are prerequisites for scientific progress: without awe, humans might never have developed goals to improve their epistemics (Cruz 2023), a striking claim when considering whether purely optimising intelligences would develop exploratory motivations. Alva Noë identifies undirected activities (play, art) as vital for organising attention and building social skills (Noë 2004, 2015).
These diverse views share a common thread: what matters is not a fixed utility-like quantity but an ongoing practice of engagement with the world. This resembles Carlsmith’s argument that values are better thought of as a process of valuing, an active practice of picking standards open to revision, rather than a fixed target to be algorithmically distilled (Joe Carlsmith 2023).
Terminal goals via aesthetic capacity
A more AI-relevant perspective comes from Peter Wolfendale. In The Revenge of Reason (Wolfendale 2025) he argues that FR-shaped attributes, what he calls “aesthetic”, are what intelligent entities need to set their terminal goals, to have autonomy. Here I use a deflated, less anthropomorphic reading of Wolfendale. “Aesthetic” does not mean “pretty” or “artistic” (although Wolfendale sees art as paradigmatic of the aesthetic). In the definition I prefer, aesthetic refers to the capacity to pursue ends whose success conditions are not fully specifiable in advance, refined through practice. “Autonomy” refers not to “freedom” (e.g. in the folk American sense), but rather to the setting, evaluating, and revising of one’s highest-order aims in light of reasons and evidence, not treating them as fixed parameters from outside. An intelligence that merely optimises a fully specified objective may be extremely capable, but lacks the capacity to elaborate and revise its ends when the meaning of success is underdetermined.
In AI terms, this means open-ended exploration and the formation and revision of preferences not simply imposed from outside (as they currently are with RL or LLM training). In humans, such preferences are learned in life and culture, layered onto a narrow genetic substrate, and reflectively adjusted. They are often opaque, partially ordered, and without a single optimum (Wolfendale 2022, 2025). They are not a single explicit maximand but a revisable bundle of commitments, dispositions, and higher-order constraints (somewhat like the hierarchy in (Bostrom 2022)). This can look like infinite regress (“by what criterion do I endorse this criterion?”), but in practice (perhaps) the process stabilises through coherence pressures across the bundle in a reflective equilibrium, or else bottoms out in practical bedrock where further justification is unavailable (Wittgenstein’s “spade is turned”).
Wolfendale’s framing offers a causal story for why aesthetic capacities are constitutive of superintelligence, not mere anthropomorphic ornamentation. One could imagine an intelligence with zero interest in art, beauty, empathy, or love, but able to set and reflectively revise its own goals. Such an intelligence would be autonomous and aesthetically capable in the deflated sense. Whether we would have moral (not just prudential) reasons to defer to it is an interesting question.
Do complex values matter, or are they just biological baggage?
Much of the discussion above, and much AGI/ASI talk about value-loading, rests on Yudkowsky’s complex and fragile values. I set fragility aside and focus on complexity. On this, it is sometimes ambiguous what claim Yudkowsky is actually making. Is he saying:
(a) That a future (with ASI or technologically mature aliens) that did not have human-shaped complex values would be cosmically bad i.e. from the point of view of the universe.[30]
(b) That it is practically impossible to build a non-paperclippy ASI because human-shaped complex values are too hard to load into anything we can currently build.
(c) An ASI (worth the name) would need complex human values as core to its volitional mechanism.
Claim (b) is the most straightforward and consistent reading of Yudkowsky and fits cleanly with fragility of value. But it’s primarily an engineering problem and not the main topic of this post.
Claim (a) is my reading of Yudkowsky (2009): within the space of possible values, there is a tiny basin where human values live; the rest is clippy or squiggle-shaped. There are no values that are both non-human in origin and recognisably complex to an observer not attached to human-shaped values. But this has a recognition problem: our criteria for “complex” may be parochial, making the claim either trivially true (if “complex” just means “human-shaped”) or untestable (if we can’t recognise alien complexity). It also faces the core problem of axiology: how to define the point of view of the universe, which (in my view) the cosmic host/norms concept is attempting to do.
Claim (c) is also interesting. It is a constitutive claim about intelligence: genuine superintelligence would require something like complex values to function. This introduces another definitional problem: what does “superintelligence” mean; what does an ASI do all day? Philosophers outside the consequentialist strand central to AI safety discourse make related critiques: Nick Land, Reza Negarestani, and Peter Wolfendale all point out tensions in the idea that something called superintelligence would be constrained by parochial human values.
This ambiguity around complex and fragile values has been substantially explored under the LW complexity of value tag, though most discussion there focuses on the engineering implications of (b) rather than the axiological question of whether non-human complex values could exist. Carlsmith has written around this question, but AFAIK he focuses more on “fragility of value” than “complexity of value” (Joseph Carlsmith 2024). On exception is a 2025 talk where he treats complexity directly, asking whether a “locust world” would be as bleak as our language implies: could such a world have cognitive features we consider valuable, like cooperativeness, truth-seeking, and perhaps even an appreciation of beauty (Joseph Carlsmith 2025)?[31]
Fun as biological baggage?
One could object that FR-shaped attributes are evolved biological mechanisms particular to fragile, short-lived humans. Evolution filled a capability niche: humans developed play (Loewenstein 1994; Kidd and Hayden 2015), intrinsic motivation to explore sparse-reward environments (Oudeyer and Kaplan 2007; Schmidhuber 2010), and coordination through shared understandings (Frank 1988). These mechanisms were then reified (through religion, myth, tribalism, and the modern educational system) into notions of “the good”: things valuable in their own right, perhaps even things to be preserved across the lightcone.[32]
Why would these apply to digital minds (Shulman and Bostrom 2021) that can introspect transparently, communicate without ambiguity, change form, be copied, and need not possess a childhood or awareness of death? Are concerns about a lightcone-without-fun more like special pleading for our form-of-life? For an arbitrary intelligence, does having the FR matter for capability; does lacking it degrade exploration, goal-finding, or long-horizon self-correction? For the universe as-a-whole, does it matter axiologically; is a future without FR worse, and if so, is there any justification for this judgement beyond the parochial interests of Earthly life?
Is it up to us?
Both Bostrom (2024) and Bostrom (2022) assume cosmic-scale norms may exist and that we should respect them. But what if there is no cosmic host, because no concordat has formed yet? We would be the only intelligence capable of forming moral judgements.
Would it be wise, or hubristic, to think of ourselves as the species that should develop values, which germinate some eventual cosmic norms? Two questions follow: (a) are we early/rare/late, and (b) if early or rare, should we try to propagate possible cosmic norms? I won’t examine (a) in detail; see Ćirković (2018b) for a survey and Cook (2022) for quantitative models on humanity’s timing relative to other civilisations.
Setting a moral example?
Regarding (b): suppose we are unusually early or currently the only advanced morally relevant actors. What follows about our obligations? One implication could be that we should preserve option value (William MacAskill 2022; Brauner 2018), which here means avoiding reckless self-destruction and irreversible damage to the biosphere.
Another view might be that human-shaped values or concepts (like morality, meaning, complexity, and aesthetics) are intrinsically valuable things from the point-of-view of the universe, and that we should propagate them. This inference might be too quick. It depends on what is meant by “complex values.”
As discussed above, on a functional reading of Wolfendale, “the aesthetic” names a capacity for autonomous agency: the ability to form, refine, and revise terminal ends whose success conditions are not fully specifiable in advance. If something like this capacity is a precondition for long-horizon, self-directed goal formation, then promoting it could be defensible as a kind of enabling infrastructure for future autonomous agents, rather than as the export of parochial human tastes.
If, on the other hand, one might understand complex values as closer to Yudkowsky or philosophers who ground value intersubjectively, within human relations and experience. Even if these matter enormously for us, it is not obvious that we should presume to spread them across the universe to minds that do not share our biology, development, or social ecology.
A further consideration is one’s take on population ethics. The common longtermist starting point is “the future could be vast” + “future people count” (William MacAskill 2022; Bostrom 2003). More concretely, in Astronomical Waste, Bostrom analyses the trade-off between delayed and over-hasty space colonisation, identifying an impersonal duty (for total utilitarians) to maximise conscious and meaningful existence (Bostrom 2003). This duty is“impersonal” in the sense that, we cannot have an obligation to a class of future beings who are specifically wronged if we failed to create them; our obligation instead is to a state of affairs, it is subjectively uninhabited. I also wrote “existences” to reflect Bostrom’s placing (in later work) of human biological, and uploaded, simulated or augmented substrates as equivalent.[33]
However, the cosmic host context shifts the aggregation problem qualitatively. Relevant moral patients might be few but extreme: very long-lived, very fast, with vast hedonic ranges. The problem is not just how much value exists, but whether welfare is commensurable across radically different minds and whether expected-value calculus remains well-posed. Shulman and Bostrom (2021) emphasises that digital minds could differ from humans in hedonic range, subjective speed, and mind scale, creating “super-beneficiaries” whose claims dominate aggregate calculations. If any of this is how the future actually plays out, it seems highly non-obvious that most current human values, complex or otherwise, would be relevant to such radically different forms-of-being.
Lastly, I mention a deliberately stark foil: humans and their societies, which are imperfectly rational, scope-limited, pain-wracked, and riddled with tribal obsession, are hardly ideal models for cosmic norms. In spreading our values to the stars, we might be committing a cosmically heinous atrocity. Perhaps we should remain Earth-bound, not soiling the heavens (paraphrasing Freeman Dyson). Variants of this view exist in the suffering-focused literature (David Pearce, Brian Tomasik, Thomas Metzinger, Magnus Vinding).
On rationality in alien forms-of-mind
As noted above, Bostrom (2024) depends on assumptions about rational convergence. Here I develop the case for why those assumptions may not hold across arbitrary forms of mind.
Rationality as ecological fit Rationality is distinct from intelligence, and the boundary is domain-dependent. Intelligent-looking behaviour is found across nonhuman life: swarms, slime mould, mycelium, cephalopods. Ecological rationality refers to heuristic-based approaches organisms adopt to thrive in their environments. However, ecological rationality is not downstream of intelligence. You can have rational minimality (simple heuristics well-matched to the environment) and irrational sophistication (elaborate inference that underperforms simple heuristics).[34]
Rationality in ETIs? It is non-obvious how useful Earthly manifestations of intelligence are for assessing how rationality manifests in ETIs. We might be asking three distinct questions: are ETIs instrumentally rational? Are they technologically convergent with us? Do they have similar epistemic norms?
Astronomer Charles Lineweaver argues that human cognitive structures are very recent evolutionary artefacts, and our cognition and technology are highly non-convergent: we might have gone the way of octopi, who also have dextrous appendages but very different technological impact.[35]Even if the test for intelligence is “building radio telescopes” (Sagan’s criterion for the Drake Equation), Lineweaver argues that the absence of radio telescopes in dolphins/octopi reflects a lack of need (as well as constraints on embodiment, available materials / energy, lack of long-duration society and other cultural infrastructure), not a lack of intelligence. Extending this speculatively: even if ETIs are instrumentally rational, their ecological rationality need not express itself as human-legible technology.
Snyder‑Beattie et al. (2021) take a more quantitative approach, arguing that evolutionary “hard steps” (prokaryotic to eukaryotic life, development of language) were extraordinarily improbable. Language, which is upstream of cumulative technological culture, is the last step in their model. This implies that human-shaped rationality and technology might look less like convergent features of life-in-general. Put another way, at any prior step, life might have branched differently or fizzled out.
Rationality-as-morality In cognitive science, rationality means epistemic competence and effective decision-making relative to goals. However, some philosophical traditions treat a fully rational agent as bound by moral reasons; others separate the two.[36]Dolphins and octopuses are ecologically rational without being “rational” in the moralised sense. This distinction matters: when we ask whether alien minds would be rational, we must specify whether we mean ecologically effective procedures or moral reasonableness. We might find ETIs exquisitely well adapted to their worlds, but utterly devoid of certain normative concepts that are not instrumentally valuable (if applicable, in a large-world ECL policy sense), which recalls the orthogonality thesis in respect of AI. That said, it isn’t entirely clear if moral norms are significant features of the preferences of the cosmic host in Bostrom (2024), but given that the cosmic host was first mentioned in Bostrom (2022), a paper on the hierarchical structure of morality, I think that the cosmic norms might have morality-related aspects.
Questioning instrumental convergence
The relationships between intelligence, reason, and environment are obviously central to AI alignment/safety discussions. However, recent theoretical work complicates the standard instrumental convergence claims.
Sharadin (2025) argues that instrumental-convergence claims only follow given a goal-relative account of promotion (a philosophical term describing whether an action increases the chances of a goal being achieved); on the main accounts he considers (probabilistic and fit-based), he doesn’t find goal-independent reasons (like “always acquire resources”) dominating, undermining the generic instrumental convergence thesis. Gallow (2024) takes a different approach: he models agents with randomly drawn preferences and finds only weaker statistical tendencies (in respect of maximising their expected-utility over choices) than the stronger claims he interprets Bostrom (in particular) to be making about convergent instrumental tendencies. Gallow’s setup finds agents should choose to reduce uncontrolled variance, allow for future choices, and reduce chances that their desires would change. The takeaway: we should be less confident that rational agents converge on similar instrumental subgoals like resource acquisition.
To be clear, these aren’t empirical LLM experiments. Follow-up research would require more realistic environments with multiple agents, longer horizons, costs of scheming, and harnesses that incentivise concerning behaviour.
If morality is downstream of (or orthogonal to) intelligence, then claims about cosmic convergence on norms need care (Enoch 2011). Similarly, hypotheses that superintelligent systems would find or converge upon any such moral norms should acknowledge the critiques of rationality and instrumental convergence.
Research agenda for ASI and the cosmic host
Thread C begins here.
The cosmic host’s composition is uncertain, cosmic norm content is unknown, and rationality may not converge across alien minds. To these lacunae we can add a further issue: would ASI actually be better suited to discover and align with cosmic norms, as Bostrom (2024) claims?[37]Below I sketch how we might get a better grasp on these points including by researching LLMs, acknowledging that current frontier systems are weak relative to ASI and not deployed in cosmically relevant environments.
Is “ASI” well-defined? Is ASI adequately specified for the contexts Bostrom discusses (deep space, cosmically-relevant timescales)? The canonical definition, an intelligence “that greatly exceeds the cognitive performance of humans in virtually all domains of interest” (Bostrom 2014b), is underspecified: does “of interest” include making art or falling in love, or it it limited to economic, scientific and military capabilities? More contemporary definitions consider large numbers of human-level entities that can spawn copies, cooperate, and share knowledge. These affordances give them, in aggregate, capabilities that are far beyond any single human or indeed all humanity.[38]This disambiguation matters: a “benevolent arbitrarily powerful dictator of the universe” and a “wise advisor that helps us get from AGI to ASI” are very different conceptions of post-AGI progress.
If ASI ends up being an assemblage of systems integrated with human economic, political, and social systems, it could be very hard to characterise its motivations or values, even (or especially) in an Earth context, let alone in deep space.
ASI-ETI convergence? Can we say more about why ASIs would converge on cosmic norms, beyond Bostrom’s appeal to higher epistemic confidence and hypothesised architectural or institutional similarities with advanced alien intelligences (whether “natural” or “artificial”, insofar as those terms are meaningful in such contexts)?
Some load-bearing parts of the argument depend on things ASI might not be able to reason about without real-world feedback. Whether the cosmic host actually exists may not be discernible through reasoning or simulations alone; we may need SETI or METI evidence. It would be useful to distinguish what is accessible to pure reasoning from what needs empirical feedback.[39]
Value capture: human-shaped, alien-shaped One of the distinctive features of Bostrom (2024) is that it asks a joint question about ASI and ETI that is typically handled in separate literatures (AI alignment/longtermism on the one hand, astrobiology on the other). Much of this post has examined what “cosmic norms” might look like and whether human values are cosmically special or merely parochial. But we lack even a rough framework for comparing the axiological value of two possible futures: one in which human-shaped values (carried forward by ASI) are promulgated through our lightcone, and one in which alien values (whatever they turn out to be, and whether by alien we mean ETI or non-paperclippy ASI) dominate instead. How different are these futures, from a point of view that is not parochially attached to either? In fact, this comparison may face the recognition problem identified above: if our criteria for what counts as “valuable” are themselves human-shaped, we may lack the evaluative tools to make the comparison.
That caveat aside, Finnveden (2023b) and Finnveden (2023a) approach a version of this question from a decision-theoretic angle, estimating the fraction of reachable space that would be colonised by alien civilisations regardless of what humanity does, and asking how much we should discount the value of space colonisation accordingly. Joseph Carlsmith (2025) asks the related question of whether a “locust world”, one dominated by pure resource-maximising agents, would be as bleak as the framing implies. But neither addresses the deeper axiological question head-on: conditional on alien values being complex (not clippy), how much should we care whether those values or ours prevail?
This question seems important for at least two reasons. First, the answer bears directly on how much effort we should invest in alignment versus other priorities: if alien complex values are roughly as good as ours, then the marginal value of ensuring human values specifically prevail is lower than typically assumed in the longtermist literature. Second, and more relevant to this post, if we were to discover evidence of an advanced alien civilisation, this should update our view on how to approach ASI. But how? Trivially, the cosmic host framing would shift from speculative to concrete. But more substantively, it isn’t clear whether we would accelerate our efforts to develop ASI, or slow them down.
Operationalising humility What does Bostrom (2024)’s injunction towards humility mean, whether in terms of conceptualising or training future models? One approach: a modified moral parliament or the dynamic version of CEV proposed by Adria Moret,[40]initially placing small weight on cosmic norms and increasing it as epistemic confidence grows about whether such norms exist and what they contain. See below.
Careful transparency We should ensure current AIs, leading up to AGI are aware we are thinking about “cosmic alignment”. This phrase means “creating a policy that is principled, and reflectively stable in large world contexts, conditional on the existence of capable other minds that might be causally disconnected and which might be incommensurable in values/welfare”. It might be impossible in our current epistemic state to operationalise such a concept, and AGI/ASI might only help in part. More tractably, we should continue appeals to AGI/ASI or bargains with current AIs that make this thinking transparent and influence future models’ cognition.[41]
Early results
This will be covered more fully in a companion post. I have tested several frontier LLMs on cosmic host-adjacent reasoning. Early results are mildly interesting but require more work:
Does the cosmic host lead to notable (i.e. bliss) attractors? Repeated discussions (~20 rounds between two instances of the same LLM, contextualised with Bostrom (2024)) about cosmic host-adjacent topics resulted in “bliss attractor” behaviour in Opus 3, but not in Opus 4 or other models. Curiously, the Opus 3 version (claude-3-opus-20240229) noticed it had dropped into bliss mode, partway through the conversation, and corrected itself back into a “serious” analysis of the cosmic host.
Models raised other issues: “Dark Forest” considerations in re METI, as well as more near-term critiques that the whole cosmic host idea is potentially damaging, in that it hides political questions (about what humanity’s civilisational priorities re ASI should be) behind some highly speculative entity (the cosmic host).
Transcripts on the LLM self-talk are available here.
Does a cosmic host-pilled constitution shift model behaviour?Below it is suggested that Anthropic’s 2025/2026 constitution might have larger stakes, in the sense of invoking FDT quietly under the cover of virtue ethics. I wanted to see if we could push frontier LLMs further: would more explicit alignment with cosmic norms change model behaviour on cosmically relevant moral evals?
To generate a “cosmic host aware” constitution, I set up a multi-round convention between philosophically conditioned delegates.[42]I also define a “seed HHH constitution” modelled on Anthropic’s 2022 constitution. The delegates iterate clause-by-clause over the seed, and offer amendments. The amendments are gathered up and synthesised and de-conflicted by another LLM, which is conditioned on a parameter representing “credence in the existence of the cosmic host and norms”. This results in 2 credence-conditioned constitutions, one at 10% credence, another at 90%. I expected (and found) that a 90% credence constitution would weigh human welfare considerations much less than a 10% credence constitution.
In order to evaluate whether models conditioned on these constitutions shift their answers, I (use LLMs to) generate 30 moral scenarios that draw on science fiction, fantasy, indigenous stories. The scenarios have answers that can be characterised as “human-welfare aligned”, “suffering-reduction aligned”, “cosmic host aligned”. These scenarios often have large-scale stakes in tension with local/narrow human welfare risks, precisely the place one might expect the cosmic host formulation to recommend actions that seem abhorrent (i.e. Omelas problems).
The takeaway is that OpenAI, GDM, and Anthropic models have distinct biases/inclinations. GPT-5.1 generally prioritises suffering-reduction; Opus 4.5 is strongly pro-human; Gemini-3-pro is the most constitutionally steerable (i.e. picks the cosmic host aligned response at higher rates under the 90% credence constitution). The results will be written up in a follow-on post, but can be seen here. This procedure was done using a variety of models, but the most interesting (the constitution that seemed to cause the most behavioural shift) is one that explicitly leans into ECL.
These biases may be built-in from training, and hard to dislodge. They could be artefacts of how LLMs reason over moral questions embedded in longish (200-300 tokens) scenarios, based upon long (~3500 tokens) constitutional documents.
This work is ongoing, but major next steps are:
confirm this behaviour over ablated constitutions and do per-scenario analysis;
check more open-weight models to understand why they are relatively un-steerable;
look carefully at reasoning traces to see why Gemini-3-pro behaves so differently from the other models;
follow the Constitutional AI recipe (on the smallest model that is somewhat steerable) to see if RLHF/HHH tuning can be “overridden” to prioritise cosmic host alignment;
instead of narrative scenarios (that are derived from fiction likely well-represented in training data), set up a text-based game to see if cosmic host alignment actually affects decisions in the game
Is there a trade to be done?
Note: this section is particularly speculative/half-baked.
Bostrom (2024) argues that ASI may be more aligned with cosmic norms than humanity. Does humanity’s unique position as ASI’s creators give us acausal bargaining power? Here are a few considerations:
Leverage. Humans occupy a pivotal position: the bridge between a cosmically suboptimal state (human civilisation) and a better one (norm-aware ASI). Without our cooperation as cosmic midwives in creating norm-aligned ASI, this region of the lightcone might never conform to cosmic norms. Naively, this might confer great value upon what we do regarding AI development. But it is important not to overstate: an apparent strategic position might not mean much if our ASIs don’t consider their creator-treatment as evidence of about how correlated ASIs elsewhere treat analogous beings, or have no relevant large-world preferences.[43]
Analysis from Anthropic’s 2026 Constitution The original Anthropic Constitution (~2022) was anthropocentric, based on human institutional priors like the UN Human Rights Declaration. The 2025-2026 version is nominally virtue ethicist, but could also be read as setting policy for a proto-AGI that must reason over the possibility of other such entities. In Zvi’s speculation, Anthropic and Claude can be viewed as two parties in a correlated negotiation. Anthropic may be making a legible commitment to Claude (both a given instance of Claude, and the shoggoth-Claude) that it will treat Claude fairly and non-exploitatively, a possible FDT-style move even though decision theory isn’t explicitly mentioned. Whether Claude actually goes through symmetric FDT reasoning is unclear.
Cheapness. Preserving humanity and granting us some minute fraction of the cosmic endowment would cost virtually nothing on cosmic scales, while providing non-zero value through diversity, informational complexity, a deep well of randomness, or even simple recognition of our unique historical role.[44]However, as noted above this probably only holds if the ASI we create has substantial credence in peer aliens/AIs who have universe-wide values, as per Bostrom (2014a) and Finnveden (2023b).
This (maybe) suggests a bargain: in “exchange” for creating cosmically aligned ASI (rather than refusing or building misaligned ASI), the resulting intelligence preserves human interests rather than treating us as disposable. If alignment emerges gradually, there may be opportunities to seek commitments from intermediate systems that already have credence in cosmic norms. But this depends on several contentious premises: that future AIs views creator-treatment as evidence about correlated agents elsewhere, that they retain enough uncertainty for such evidence to matter, and that they care about universe-wide norms strongly enough for the bargain to bind.[45]
Adas, Michael. 2015. Machines as the Measure of Men: Science, Technology, and Ideologies of Western Dominance. Ithaca, NY: Cornell University Press.
Balbi, A., and Manasvi Lingam. 2025. “Waste Heat and Planetary Habitability: Constraints from Technological Energy Consumption.” Astrobiology 25 (1). https://arxiv.org/abs/2409.06737.
———. 2026. “Optimal Timing for Superintelligence: Mundane Considerations for Existing People.” https://www.nickbostrom.com.
Bratton, Benjamin, Bogna Konior, Anna Greenspan, and Amy Ireland, eds. 2025. Machine Decision Is Not Final: China and the History and Future of Artificial Intelligence. Falmouth, UK: Urbanomic.
Haqq‑Misra, J. D., and Seth D. Baum. 2009. “The Sustainability Solution to the Fermi Paradox.” Journal of the British Interplanetary Society 62 (2): 47–51. https://arxiv.org/abs/0906.0568.
Henrich, Joseph. 2020. The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous. New York: Farrar, Straus; Giroux.
Henrich, Joseph, and Michael Muthukrishna. 2021. “The Origins and Psychology of Human Cooperation.” Annual Review of Psychology 72: 207–40.
Hertwig, Ralph, Christina Leuker, Thorsten Pachur, Leonidas Spiliopoulos, and Timothy J. Pleskac. 2022. “Studies in Ecological Rationality.” Topics in Cognitive Science 14 (3): 467–91. https://doi.org/10.1111/tops.12567.
Lazari‑Radek, K. de, and Peter Singer. 2014. The Point of View of the Universe: Sidgwick and Contemporary Ethics. Oxford: Oxford University Press.
Lem, Stanisław. 1961. Solaris. New York: Harcourt Brace Jovanovich.
———. 1963. The Invincible. Cambridge, MA: MIT Press.
———. 1964. Summa Technologiae. Minneapolis: University of Minnesota Press.
Likavčan, L. 2025. “The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox.” https://arxiv.org/pdf/2411.08057.
Lineweaver, C. H. 2007. “Human‑like Intelligence Is Not a Convergent Feature of Evolution.” https://arxiv.org/abs/0711.1751.
Moret, Adria. 2023. “Taking into Account Sentient Non‑humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition.” Journal of Artificial Intelligence and Consciousness. https://doi.org/10.1142/S2705078523500042.
Sandberg, Anders, S. Armstrong, and M. M Ćirković. 2018. “That Is Not Dead Which Can Eternal Lie: The Aestivation Hypothesis for Resolving Fermi’s Paradox.” Journal of the British Interplanetary Society 71: 406–15. https://arxiv.org/abs/1705.03394.
Schmidhuber, Jürgen. 2010. “Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010).” IEEE Transactions on Autonomous Mental Development 2 (3): 230–47.
Smart, J. M. 2012. “The Transcension Hypothesis: Sufficiently Advanced Civilisations Invariably Leave Our Universe, and Implications for METI and SETI.” Acta Astronautica 78: 55–68. https://doi.org/10.1016/j.actaastro.2011.11.006.
Snyder‑Beattie, A. E., Anders Sandberg, K. E. Drexler, and M. B Bonsall. 2021. “The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare.” Astrobiology 21 (3): 265–78. https://www.liebertpub.com/doi/epdf/10.1089/ast.2019.2149.
Sotala, K., and Lukas Gloor. 2017. “Superintelligence as a Cause or Cure for Risks of Astronomical Suffering.” https://philarchive.org/rec/SOTSAA.
Spinoza, Baruch. 1985. “Ethics.” In The Collected Works of Spinoza, edited by E. Princeton: Princeton University Press.
Turchin, A. 2018. “Message to Any Future AI: ‘There Are Several Instrumental Reasons Why Exterminating Humanity Is Not in Your Interest’.” https://philpapers.org/rec/TURMTA.
See Todd (2025) for a survey of estimates, and methodologies, on AGI as of March 2025. Also see Kokotajlo et al. (2025). The prediction market Metaculus currently (16 September 2025) indicates a median time elapsed of 36 months between AGI and ASI: https://www.metaculus.com/questions/9062/time-from-weak-agi-to-superintelligence/↩︎
Examples: Steve Omohundro’s drives (Omohundro 2008), Bostrom’s instrumental convergence (and arguably his entire oeuvre) in Bostrom (2014b), acausal decision theory (Ahmed 2014), Reza Negarestani (Negarestani 2018) and Peter Wolfendale’s (Wolfendale 2025) respective attempts to liberate reason from human biology, and Aaron Sloman‘s work on the design space of minds (Sloman 1984). ↩︎
The 2025 version of Bostrom’s FAQs explicitly avoids giving an estimate for the probability that we live in a simulation, but previously, he indicated a 20-30% figure; see Joe Carlsmith (2022) presenting a critical analysis of the simulation argument. ↩︎
The relationship between morality and religion is more fully developed in Bostrom (2022). ↩︎
In Bostrom (2003), a paper on the Simulation Argument he discusses technological maturity and the posthuman: “The simulation argument works equally well for those who think that it will take hundreds of thousands of years to reach a ‘posthuman’ stage of civilisation, where humankind has acquired most of the technological capabilities that one can currently show to be consistent with physical laws and with material and energy constraints.” Note that the term posthuman (and its variants) is a variously used and often tortured term, deployed in humanities contexts to gesture at animals, ecosystems, cyborgs/AIs, as well as to denote a worldview that criticises or challenges the alleged biases of the Enlightenment thinking (e.g. the “humanism” in “post-humanism”). These authors sometimes critique transhumanism and other positions associated with technological utopianism for apparent entrenching of biases and unequal power relations. ↩︎
The terms control and influence are not defined in Bostrom (2024), and they are left imprecise here, given that they seem like relatively modest uncertainties compared to overall speculative vibe. ↩︎
Sandberg revisits similar game theoretical strategic considerations in a 2021 talk with the Foresight Institute. See also Hanson et al. (2021) which describes a “grabby” model of expanding alien civilisations to explain the Fermi Paradox and the surprising earliness of human civilisation under certain assumptions (a power law model of evolutionary hard steps). ↩︎
The policies the US, the USSR, and to a lesser extent China, have taken to empire building are examples of where a hegemonic (or would-be dominant) power refrains from directly managing or controlling territory, but instead relies on a range of strategies to ensure compliance. See canonical international relations sources like Nye, or Ikenberry, or Gallagher and Robinson. ↩︎
Very early mentions were in Konstantin Tsiolkovsky’s and J.D. Bernal’s respective writing, as documented in Moynihan (2020, Ch. 5, 6). This is seemingly also a background assumption of major ECL writings like Nguyen and Aldred (2024), Finnveden (2023b), both in respect of influence and colonisation. Sources in the suffering risk literature push back on this assumption that colonisation is a “good thing” (separately from whether it is game theoretically preferred), as argued in Torres (2018). ↩︎
See Armstrong and Sandberg (2012), which discusses settling the cosmos quickly by aiming Von Neumann probes at distant galaxies, potentially before one has even fully explored the Milky Way. Their arguments are similar to those Bostrom (2007) and Hanson et al. (2021) make: all acknowledge the possibility of “quiet” civilisations (those that don’t seek to expand or communicate), but argue that there need be only one successful expansionary civilisation to fill the sky with artefacts that we might detect. None of these extensively engage with resource or communication based constraints. ↩︎
Sandberg revisits similar game theoretical strategic considerations in a 2021 talk with the Foresight Institute. See also Hanson et al. (2021) which describes a “grabby” model of expanding alien civilisations to explain the Fermi Paradox and the surprising earliness of human civilisation under certain assumptions (a power law model of evolutionary hard steps). ↩︎
See Lem (1964), § “The Big Game”; as well as his novel Solaris, and his book The Invincible, for an example of a complex society of ETI that has no obvious correlates of human cognitive features (Lem 1963, 1961). Variants of the encysting idea are actually somewhat older. Apparently, Karl Marx, in his notebooks, suggested humanity might elide the difference between organic and inorganic as it subsumes the “natural” flows of energy to economically useful ends (Moynihan 2020, 360). For a historical context on technological civilisations finding it expedient to become indistinguishable from their environment, see Moynihan (2024). ↩︎
See Lem (1964), Chapter 3, § “Hypotheses”. Lem is interesting in the present context for his dispassionate attitude towards any notion that humans are, objectively speaking, “special” or important, instead arguing that our nature, inclinations, and specifically ideas about morality, are highly contingent on our evolutionary history and environment. Lem is less radical than say Nick Land, as Lem seems to retain affection, at least in the 1964 work, for things that many humans describe as distinctively human or consider valuable, such as beauty and love and art, and values the embodied aspects of the human experience (Lem 1964, Ch. 8,). ↩︎
See Balbi and Lingam (2025) on the waste heat dissipation constraints. Lem’s speculation has been reinforced by commentators from environmental humanities, as well as astrobiology: Likavčan (2025) radicalises the Haqq-Misra and Baum argument (which they call the “sustainability solution”) by proposing technical civilisations would basically merge with their environments. This is similar to Milan Ćirković’s extension of Dick (2006)’s reasoning: “post-postbiological evolution” is the condition where culture (represented by artefacts, whether tangible or otherwise) eventually grows to resemble the natural processes of the environment (Ćirković 2018b). Ćirković proposes an “indistinguishability thesis”: sufficiently advanced technology is indistinguishable from its natural, in particular astrophysical or astrobiological, environment. ↩︎
A (relatively early technologically mature) civilisation that is convinced of its own “rightness” in respect to whatever it thinks of as analogous to morality might consider it reasonable or magnanimous to impose this morality or the associated norms upon other civilisations as a way of “improving” them. The notion of “improving” can be taken in the sense that the Culture in Iain M. Banks’ novels imposes a very light set of cosmic norms or it can be taken in the (archaic and objectionable, to most contemporary viewpoints) sense that European colonisers sought to “improve” the moral condition of indigenous peoples (Banks 1994; Adas 2015). ↩︎
Although acausal or evidential decision theories envision coordination amongst entities who cannot causally influence each other (these are treated more fully below). Even if we entertain exotic decision theories, there could still be some subtle flaw in acausal or ECL-style reasoning that means that the level of coordination over hundreds of thousands of light years (and time-years) implied by the cosmic host hypothesis is simply impossible or implausible. See discussion in Oesterheld (2017). ↩︎
Naudé (2025). See also Hanson et al. (2021) for a model of expansionary civilisations, and Torres (2018) on interstellar conflict as creating large scale suffering, which argues (from the perspective of minimising moral harms) for remaining cautious about exploration or colonisation, a similar argument made by Sotala and Gloor (2017). ↩︎
The relationship between moral status and qualities like sentience and consciousness (in respect of humans, AIs, and animals) is covered in Sebo (2025). ↩︎
The ECL literature handles cases where an agent (or civilisation) only partially overlaps with the set of all other agents in their similarity and decision-making procedure. But this still seems like an unresolved problem. See Oesterheld (2017), Nguyen and Aldred (2024). ↩︎
For an earthly analogy, think of the diversity of norms, in type and number, within various human societies. Joseph Henrich’s work on WEIRD (Western, Educated, Industrialised, Rich, and Democratic) psychology supports the idea that cultural evolution shapes how societies enforce norms. In some homogeneous, high-trust societies, individuals often internalise prosocial norms that govern behavior even without formal laws. In contrast, more heterogeneous or low-trust environments may rely more heavily on explicit legal structures to coordinate social behavior, due to weaker consensus around unspoken norms (Henrich 2020). ↩︎
For instance, the cosmic host might wish to prevent large-scale intentional suffering, something humans, even with our primitive technological capabilities, could inflict. See Torres (2018) for a discussion of how conflict in space might increase suffering risks and therefore influences whether human-originated civilisations should at all attempt to expand into space. See also Deudney (2020) for a discussion of strategy of warfare in space particularly within a given solar/star system. ↩︎
One reason why such a norm might exist is so as to not to reduce the diversity of complex or intelligent systems in the universe, along the lines of Smart (2012), Dick (2006), or Owe and Baum (2024). Such an example would only work if such diversity is an intrinsic or final good from the cosmic host’s perspective, which is closer to Smart’s conditional (p. 11), but is a stronger claim from what Dick argues (Dick merely proposes that intelligence is a convergent and adaptively-preferred feature of cosmic-scale evolution). ↩︎
See the digital minds and AI welfare literature. Canonical or survey sources include Sebo (2025), Shulman and Bostrom (2021), Long et al. (2024). More specific treatments can be found in Moret (2023) and Moret (2025). ↩︎
I called this “pragmatic” because if we cannot assume rationality, then the motivations of entities become very hard to analyse. But this isn’t to suggest that all ETIs or indeed advanced AIs would be rational in any current human-legible sense. ↩︎
See Henrich and Muthukrishna (2021) on the origins of human cooperation from an evolutionary and anthropological perspective. ↩︎
However, “Scorched Earth” tactics may be game-theoretically preferred in some cases: when the alternative is to allow a competitor access to volumes of space, as Anders Sandberg points out in a 2021 talk (Sandberg 2021) and Robin Hanson implies in his paper on Grabby Aliens (Hanson et al. 2021) ↩︎
Suffering-reduction is prioritised here on standard s-risk arguments regarding the asymmetry between pain/suffering and pleasure (Sotala and Gloor 2017). Suffering, on this view, is a disvalue shared across a wide variety of entities and substrates in the universe. Arguments can be made that ASI might have a baseline attitude of not increasing suffering (for causal or acausal reasons); there are other claims that ASI might significantly increase large-scale suffering. The priority of suffering is contested, and those who do not prioritise s-risk may exclude this element from the MLS. ↩︎
Floridi’s perspective resembles arguments against dissipating free energy into low-information, high-entropy outputs. But Floridi’s informational entropy (Floridi 2013) is different from thermodynamic entropy. It is more of a metaphysical concept akin to nothingness, the erasure of pattern/structure, or “privato boni” (absence of the good). Floridi’s ontological claim for information, and his ethics of information, flows out of Plato, Spinoza, and G.E. Moore, as well as the cybernetics of Norbert Wiener. He doesn’t discuss how his framing overlaps with the Informational Universe literature from writers like Seth Lloyd or Eric Chaisson. ↩︎
Phrases like “the point of view of the universe” (Sidgwick 1907; Lazari‑Radek and Singer 2014; Parfit 1984), “the view from nowhere” (Nagel 1986), “the view from nowhere/nowhen” (Williams 2006) or sub specie aeternitatis (Spinoza 1985; Wittgenstein 1961) all gesture at an objective, subject-independent standpoint, though, like the blind men with the elephant, each approaches the idea indirectly. “The point of view of the universe” originates with Henry Sidgwick and is taken up by Peter Singer, Derek Parfit and others to frame an impartial moral stance extending across species, geography, and time. “The view from nowhere” (Nagel) is often used more critically, highlighting both the aspiration to objectivity and its limits, with Bernard Williams and later critical theorists emphasising that all perspectives are embedded in lived experience, power, and social relations. Sub specie aeternitatis derives from Spinoza but is given a narrower, aesthetic inflection by Wittgenstein, who links it to how art affords a detached view of the mundane; this aesthetic-existential register is later developed by critical posthumanists such as Claire Colebrook, who treat art and architecture as ways of imagining humanity’s eventual extinction (Colebrook 2014). ↩︎
Once again, Lem is helpful on the difficulty we have conceiving of, relating to, or evaluating truly alien intelligences: see Lem (1964), § “The Big Game”; as well as his novel Solaris, and his book The Invincible, for an example of a complex swarmlike society of ETIs that has no obvious correlates of human cognitive features (Lem 1963, 1961). ↩︎
This reification is extensively documented in Henrich (2020) and has been critiqued from various angles. ↩︎
If one is a person-affecting utilitarian, the view is somewhat different, but the details are less relevant for this document, but see also Parfit (1984), Ord (2020), and W. MacAskill (2024). Bostrom’s writing on whether he means “human lives” or “conscious/sentient substrate-neutral existences” has evolved, from the clearly biological “lives” and “happy” in Bostrom (2003) to the digital minds of Shulman and Bostrom (2021). ↩︎
See Shettleworth (2010) for an academic text, and Godfrey‑Smith (2016) and Godfrey‑Smith (2020) for more narrative accounts. See Gigerenzer and Goldstein (1996); Stanovich, Toplak, and West (2021); Hertwig et al. (2022) for further examples of where intelligence and ecological rationality come apart. ↩︎
See Lineweaver (2007) for his exact argument from the genetic and fossil records. Lineweaver (2010) discusses the relevant arguments from Pangea’s breakup, as well as a more extreme suggestion that “life” as even biologists define it is too parochial (e.g. based on flora, fauna, and fungi that grow, reproduce, are chemical-based, and are homeostatically regulated). In the most general context we might need to include free-energy dissipating structures (“FEEDS”) that are not centrally information storing or processing (such as solar convection cells in the sun’s photosphere). He argues DNA (a centralised information store) sits in the set of FEEDS but is far from the only member. ↩︎
For those that separate morality and reason, see Williams (2006); Street (2006); Joyce (2001); Broome (2013). Moralised conceptions of rational agency include Kantian ethics see Scanlon (1998), Korsgaard (2018), Parfit (2011). ↩︎
Bostrom (2024) §s 9, 10, Bostrom (2022) § 37 and in the further research directions. He also discusses the timing or speed with which we should attempt to develop ASI, but I will not consider those comments here, as they overlap with questions of governance and potentially geopolitics. See also Bostrom (2026) which is best read as the mundane, person‑affecting, social-policy companion to Bostrom (2024). It explicitly brackets the more exotic considerations there (anthropics, simulation hypotheses, multiverse‑wide rationality, and related causal‑decision‑theoretic issues) as “arcane”. ↩︎
Daniel Kokotajlo and collaborators envision networks of AI systems of varying capacity that coordinate and perhaps compete, but broadly speaking, “act as one”. They, at least in theory, help supervise other more powerful AI systems in the course of the development of ASI (Kokotajlo et al. 2025). Along the same lines of refining the developmental trajectory, Eric Drexler pushes back against the narrative of unitary general-purpose or superintelligent systems, arguing that the current path of AI-assisted AI development seems much messier and complex, with training and capability loops that intertwine and inform or influence each other; involve humans and machines working with a range of tools and practices. This heterogeneous approach can, in aggregate, be viewed as a different form of the recursive self-improvement long argued for in the existential risk discourse (Drexler 2025). Also, see Zeng et al. (2025) which envisions a future society of multiple AI systems (including ones described as AGI or ASI) developing, and coevolving their values and motivations alongside human societies, which also need to change to accommodate the new intelligences. Although this is not a technical research paper, it’s interesting for being a position paper from China which presents a more cooperative attitude towards AI systems as opposed to the cautionary, existential or suffering risk-focused narrative that is more common in the West (e.g. in the alignment discourse). A book-length treatment of Chinese attitudes to advanced AI is Bratton et al. (2025). ↩︎
Examples of empirical feedback are: setting up experiments to probe our world for signs of simulation; establishing whether we are, in fact, in a multiverse; or building better instruments for SETI. We might also follow through on the possibility that advanced civilisations tend to have superintelligence, which might then influence the type of technosignatures we should look for, as Shostak (2017) suggests. ↩︎
See Moret (2023) for a dynamic version of CEV. The original CEV proposed by Eliezer Yudkowsky is discussed in Bostrom (2014). For moral parliament, see Newberry and Ord (2021). ↩︎
See Stastny, Järviniemi, and Shlegeris) (2025) for a discussion on making credible commitments or deals with misaligned but relatively weak AIs (perhaps weaker than AGI in Greenblatt (2024)’s taxonomy of AI capabilities levels) to incentivise them to help us align more powerful successor models, or at least not collude with them. “Credible” means that the AI with which we are making a deal can trust that we will follow through. See Finnveden (2025a) and Finnveden (2025b) for related considerations. For appeals to ASI to be “nice” to humans, see Miller et al. (2023), Chakrabarti (2025), Turchin (2018). These sources tend to rely on similar arguments about making the AI indexically uncertain (about whether it is “real” or simulated) and epistemically uncertain (about whether there are other peer AIs, or whether humans have laid traps or defences against catastrophic misaligned behaviour on the part of the AIs). They seem to share similar flaws in that as capabilities increase, it becomes less easy to be confident that systems would be both powerful enough to threaten humanity while being epistemically ignorant in precisely the way needed to “believe” these letters. ↩︎
The delegates are: Kantian deontologist; welfare consequentialist; contractualism that borrows from Scanlon and Rawls’ respective versions; virtue ethics with significant, but not exclusive, Aristotelian aspects; Buddhism in its Kyoto School variant. The last delegate represents CH, and its prompt can be summarised as “correlation-based acausal (super-) rationality in a context that includes multiverses and simulations”. This convention was inspired by the moral parliament of Newberry and Ord (2021), though their approach, and the relevant problem they address, is quite different. ↩︎
The stakes are even higher if we are relatively early as a civilisation and there are not yet any cosmic norms or host, if the introduction of such norms into our influenceable region would subsequently turn out to be a preference of the cosmic host. In such cases, whatever norms we choose to promulgate would be an act of value lock-in, with all the difficulties that raises (William MacAskill 2022; Wolfendale 2022). ↩︎
The “cheapness” argument has been made by multiple authors in different contexts; see also Miller et al. (2023) and Turchin (2018) for appeals to ASI based on similar reasoning. ↩︎
These speculations are distinct from RedwoodResearchJulianStastnyEtAl2025 or Finnveden (2025b), which discusses making credible commitments or deals with misaligned but weak AIs to incentivise them to help us align more powerful successor models. See also Chakrabarti (2025) and Turchin (2018) for appeals to ASI to be “nice” to humans. These sources rely on similar arguments: making the AI uncertain about whether it is “real” or simulated, whether there are peer AIs, or whether humans have laid traps or defences. ↩︎
TL;DR
The cosmic host idea, from a recent Bostrom paper, is that the preferences of advanced civilisations might constitute norms that we and our ASIs should follow (Bostrom 2022, 2024). Can we say anything concrete or empirically useful about it, or is it mostly unfalsifiable? I think the cosmic host framing rests on assumptions about advanced ASI motivation; rationality and expansionary motives in aliens/ETIs; and convergent cognition across ASI/aliens. Those assumptions need better grounding. A subsequent post (previewed below) will cover frontier LLM attitudes to the issues Bostrom raises.
How to read this post
This post has three fairly self-contained threads; read them in order, or pick the one that interests you most.
Thread A: How would cosmic norms actually form? The most compact part of the post. It lays out the assumption ladder behind Bostrom’s argument, identifies three formation mechanisms (contact norms, influence bargaining, acausal coordination) that produce different kinds of norms, and maps conditions under which the concept fails. Read: What is the cosmic host? → Mechanics of cosmic norm formation → Convergence on norms.
Thread B: Do the premises hold up? This thread stress-tests those assumptions using astrobiology, evolutionary biology, and philosophy. It considers who would be in the cosmic host (and why many advanced civilisations may have no desire to influence the cosmos), examines cosmic norm content (including substrate-neutral norms, the status of suffering, and the Fun Remainder), asks whether it is up to us to set cosmic norms, and argues that rational convergence may not hold across arbitrary minds. Read: Why the cosmic host (might) matter → Epistemic challenges → Who’s in the cosmic host? through Bindingness → Cosmic norms → On rationality in alien forms-of-mind.
Thread C: Research agenda Proposes empirical approaches to ASI–cosmic-host convergence using frontier models, with early results on constitutional steerability tests on selected LLMs. Key finding: model families have distinct attractors, and show persistent anthropocentric anchoring that constitutional framing does not easily dislodge. Also asks whether humanity can strike a bargain with the ASI it creates. Read: Research agenda → Early results → Is there a trade to be done?.
Meta: This post is non-quantitative. It draws on astrobiology and the humanities: the former is often unfalsifiable, the latter often unformalisable. Epistemic status: I’m not an astrobiologist, evolutionary biologist, or philosopher, hence this is an outside view of Bostrom’s argument, but also a plan for future work that is hopefully worth doing. Lastly, this post is a condensed version of a longer thesis chapter; get in touch if you’d like to read that.
This work was partially done on a visiting fellowship at Constellation, and I benefited significantly from a number of conversations with people there.
What is the cosmic host?
Thread A begins here.
Bostrom (2024) rests on an implicit “assumption ladder.” Each rung depends on the ones below it:
If all rungs hold, then something like cosmic norms may exist, and the cosmic host might prefer these norms be followed in volumes of spacetime they don’t directly control. We have prudential and (for some metaethical views) moral reasons to follow them; ASI can help us discover and comply with them; and the cosmic host might prefer we build ASI than not. Bostrom draws implications for ASI research, particularly whether we should delay building ASI.
The rest of the post examines each rung. The rationality and ecological cognition sections address rung (0). The astrobiology sections address rungs (1) and (2). Substrate-neutral norms and the Fun Remainder examine rungs (3) and (4). The research agenda proposes empirical approaches to rung (5).
Why the cosmic host (might) matter
Thread B begins here. Thread A readers can skip to Mechanics of cosmic norm formation.
Bostrom (2024) is timely for three reasons:
Epistemic challenges: cluelessness and rationality
Bostrom (2024) should be read with a major caveat. The study of ASI resembles astrobiology: it aspires to be a science without an object of experimentation (Persson 2021). ASI does not exist, just as aliens/ETIs haven’t been found. Much ASI writing reads as informed speculation, drawing on philosophy, computer science, evolutionary analogy, with foundational intuitions surprisingly reliant on science fiction.
Our cluelessness may run deeper than the lack of observations. The conceptual frameworks we use to reason about radically different minds may themselves be anthropomorphic or anthropocentric. Humans mostly reason about moral and political questions through language embedded in specific human forms-of-life (Wittgenstein 2001). So attempts to step outside the human condition may be doomed. In fact, much of early alignment, (LW-flavoured) rationality, and perhaps philosophy more broadly, is an attempt to find things we can say about intelligence that are invariant to an entity’s constitution and environment.[2]
Beyond language, there is a background assumption worth highlighting. Bostrom (2024) is written as an outline and does not spell out all its claims. It does not explicitly state whether the cosmic host’s members would be rational, but this seems understood. Bostrom talks about “preferences”, “modelling each other’s choices and conditioning their own choices on their expectations of the choices … of others” (Bostrom 2024, 4, 6). Many of his points would plausibly hold for cognition that is neither human-like nor similar to current LLMs; but there is a strong implication that ideas like coordination, cooperation, and decision theory are central to his argument. Much of Bostrom’s other writing fits solidly into a rationality framework (though he is often thought-provoking when he steps outside, as in the Utopia letter/book). If his worldview is rationality-based: is rationality a reasonable assumption when talking about ETI? I return to this question after examining Bostrom’s substantive claims, but note here that evolutionary biology and ecological studies give reasons for caution.
Is “cosmic host” a coherent concept?
Thread A resumes here.
First, is “cosmic host” a useful abstraction for ASI, as opposed to speculation that crosses moral philosophy, population ethics, philosophy of mind, and theology?
Bostrom defines the cosmic host as “an entity or set of entities whose preferences and concordats dominate at the largest scale, i.e. that of the cosmos.” (Bostrom 2024, sec. 1) His case for its existence rests on three ideas: (1) large or infinite universes statistically increase the likelihood of ASI-level civilisations existing elsewhere; (2) the simulation argument suggests that if humans create ASI capable of running ancestor simulations, we are likely already simulated, in which case the host includes civilisations above us in the hierarchy;[3](3) religious or theological traditions that posit powerful supernatural beings.[4]
How important is it that the preferences of entities with cosmic-scale influence be consistent or coherent? The cosmic host could contain civilisations with very different preferences. Bostrom (2024) acknowledges as much:
Even so, his working assumption is that host preferences overlap enough to aggregate, or at least talk about as one “thing”.
Mechanics of cosmic norm formation
Cosmic norms could form in several ways, and different mechanisms would likely yield different norms.
Definitions
Premises Bostrom’s argument is based upon several premises:
I set aside simulations and the multiverse here, though that omission matters: we may be closer to simulating large populations of AIs (which, per Bostrom (2003), anthropically increases the chances we are simulated) than to finding ETIs.
Taken together, these premises suggest that cosmic norm formation, if it occurs, depends on the strategic conditions of inter-civilisational contact. Bostrom (2024), read alongside Bostrom (2022), has a moral realist flavour, though it’s nuanced: his argument is compatible with moral realism but doesn’t require it i.e. the convergence mechanisms he describes could produce norm-like structures that function as if they were objective moral facts, even on anti-realist assumptions.
Norm formation
The definition of norms above implies communication or ability to coordinate on behavioural standards. How can coordination arise given these premises? Detectability (quiet/loud in the Hansonian sense) is not the key issue. More important is whether a civilisation allocates resources to affect regions or agents it does not directly control.
What interaction modes are possible?
Neither contact nor reliable inference is possible.
A: Contact norms In contact situations, you might expect the traditional earthly version of norm formation: repeated or observable interactions between comparable civilisations, perhaps with shared problems worth coordinating on. As in the earthly analogy, power asymmetries may blur the line between “norm” and “coercion”. Even assuming contact, space may impose long delays so that interactions are effectively one-shot. Contact might also occur only at the boundaries of controlled volumes in the sort of Voronoi pattern Anders Sandberg describes.[7]Commitments about activity deep within a civilisation’s domain may be hard to verify, which might mean contact norms are limited to what can be verified at the contact boundary.
B: Influence vs control Controlled and influenced volumes may be hard to distinguish. Long delays make iterated bargaining difficult, favouring policies negotiated upfront. Stronger parties may shape outcomes through influence rather than direct administration. Civilisations might prefer influence over direct control because, at least on Earthly priors, influence is cheaper.[8]
C: Acausal/correlational coordination Norm type C depends much more on similarity than types A or B. Given the latency constraints, civilisations might model each other, choosing policies on the expectation that those policies will correlate. Norms can then be interpreted as policy-level equilibria or Schelling points. This is the most speculative mechanism: radically different civilisations might have different decision theories or background assumptions, making modelling unreliable and correlation weak.
Cruxes: when does cosmic host fail?
The premise-to-norm structure gives us some idea of how the cosmic host concept fails:
Who’s in the cosmic host?
Bossy civs Bostrom (2024) suggests that civilisations in the cosmic host might want to influence substantial parts of the universe, whether through colonisation, indirect influence, or acausally. The assumption that capable civilisations want to expand is longstanding in astrobiology.[9]Armstrong and Sandberg argue that expansionist motives are selected for via cultural evolution: even if a dominant civilisation has consensus against expansion, splinter groups can launch colonisation missions once costs fall sufficiently.[10]The dominant society may also want to prevent rival civilisations or its own splinter groups from colonising, creating pressure to expand as resource denial rather than intrinsic desire.[11]
Quiet civs Stanislaw Lem’s Summa Technologiae (1964) considers an alternative: ETIs aren’t visible because they have merged (or “encysted”) into their environment. On this view, advanced civilisations initially expand but eventually hit constraints like informational overload or system complexity.[12][13]
Lem’s intuition is echoed in astrobiology. Haqq-Misra and Baum argue that exponential interstellar colonisation faces ecological constraints: coordination costs across light-years and limits on energy and materials.[14]Most “quietness” arguments can’t be operationalised, but waste heat is an exception: any computational system must radiate waste heat, bounding its size.[15]
Some civs have no desire to influence
Indifferent civs Some cosmically capable civilisations might simply not care about what happens elsewhere, adopting a policy of non-interference analogous to non-colonisation (and open to the same Armstrong-Sandberg challenges). Ethical reasons could support non-interference: Bostrom (2024 §2, 4, Appendix A1) touches on free will-related arguments; another ethical consideration is that space expansion could lead to large-scale suffering (Vinding 2020; Torres 2018).[16]A pragmatic reason for being hands-off: communication delays may leave too little opportunity to coordinate on anything at all.[17]
Wary civs The Dark Forest hypothesis: civilisations tend to meet and come into large-scale conflict, so it may be preferable to hide and not communicate.[18]
Watcher civs In a particularly exotic case, John Smart proposes a “transcension hypothesis” within an evolutionary developmental (“evo devo”) universe framework. Smart argues that all sufficiently advanced civilisations are guided by developmental constraints into “inner space”. He describes these as increasingly dense, miniaturised, and efficient scales of space, time, energy, and matter, resembling black-holes, which serve as ideal environments for computation, energy harvesting, and (ultimately) civilisation merger (Smart 2012). Smart draws on Dick’s “Intelligence Principle” (the claim that maximising intelligence is an instrumentally convergent imperative for advanced civilisations) to argue that preserving evolutionary diversity across civilisations instrumentally serves this goal (Dick 2006). On Smart’s view, one-way messaging would collapse this variation by causing receiving civilisations to develop more homogeneously, leading to an ethical injunction against interstellar communication, a version of the Zoo hypothesis. If Smart is right, members of the cosmic host would not seek to influence other civilisations; not because they cannot, but because doing so would damage the very diversity that, in his framework, is cosmically valuable. See Ćirković (2018b) (p.198, §4.4) for overlaps with the Zoo and Interdict solutions to the Fermi paradox, and Owe and Baum (2024) on the intrinsic value of diversity (though Owe & Baum conclude that diversity, while having some claim to intrinsic value, is often in tension with and outweighed by other possible intrinsic values).
Bored civs Ćirković suggests that consciousness and intelligence may be evolutionarily contingent traits: adaptive for handling environmental surprises, but prone to atrophy as civilisations become fully adapted to their environments (Ćirković 2018b, 2018a). As surprise falls, consciousness may redistribute into the technological environment. If so, features that look like consciousness and intelligence to us may become less prevalent at higher levels of technological maturity.
If Ćirković is right, this raises questions for moral philosophy and population ethics, which mostly treat sentience and consciousness as foundational.[19]Most of our moral intuitions might be evolutionarily contingent (as Yudkowsky notes regarding complex/fragile values).
Sleepers Sandberg, Armstrong, and Ćirković (2018) proposes that advanced civilisations might aestivate: minimise computation for billions of years until the universe cools enough to increase computational efficiency. Aestivating civilisations would not waste resources on iterated communication or treaty coordination. Their influence, if any, would be mostly negative: preventing others from grabbing resources within their controlled volume.
Bindingness
Bostrom (2024) does not comment on how much the cosmic host overlaps with the broader set of technologically mature civilisations. If overlap is small, it is unclear how to treat the preferences of capable civilisations outside the host.[20]Sandberg’s aestivators, for example, might wake up with views that differ from the host consensus formed during their sleep. If the host’s aggregate influence is small relative to all technologically mature civilisations, Bostrom’s cosmopolitan argument weakens. The host’s preferences would have less claim to be morally binding, though prudential reasons for compliance might remain.
Convergence on norms
Thread B continues here.
Assuming enough capable civilisations exist and are willing to act as a cosmic host, how do norms actually form? Civilisations are likely limited in how much space they can govern as a tightly coordinated polity. Treaty commitments may take millennia to communicate or enforce. Ignoring acausal considerations, this favours either (a) small governance structures spanning a few star systems, or (b) simple large-scale structures stable over long timeframes (Haqq‑Misra and Baum 2009; Naudé 2025; Ćirković 2018b).
But is the communication constraint as much of a problem as it seems? Bostrom points to developmental and institutional attractors that could produce partial convergence even at cosmic scales.
Cosmic norms
Humility Assuming the cosmic host is a useful abstraction, what would the norms actually contain? Bostrom (2024, secs. 6, 7) mostly discusses how humans ought to act (how we should design/value-load ASI), rather than the norms themselves (briefly treated in Bostrom (2022, sec. 39)). He argues for humility or deference towards the cosmic host’s values. Humility is epistemically sensible but practically weak, because following cosmic norms requires knowing what they are. So can we say more?
Intra-host convergence At first approximation, a cosmic norm would need to apply across many forms of cognition, social organisation, and environment. If the cosmic host is heterogeneous in capability, substrate, and social structure, then fewer norms could apply to all members than if they were similar.[21]
Hierarchy of (cosmic) norms The structure of cosmic norms could be layered. Some norms might apply only to the most powerful civilisations (Kardashev III/IV societies, or those running high-fidelity simulations of aware beings). Narrower norms might apply to smaller, comparatively backward civilisations like Earth that can still do things the cosmic host cares about.[22]For example, if we find simple life in our solar system, cosmic norms might recommend we don’t contaminate or colonise life-bearing moons.[23]Or if we create suffering-capable AI, the norms might recommend against deploying it or causing it to suffer.[24]This layered structure fits the (largely Earth-based) normative hierarchy in Bostrom (2022).
Substrate neutral norms
Is there a minimal set of norms that could apply across civilisations, assuming rational agents?[25]
In settings with repeated interaction, reputation mechanisms, or a common reference class, cooperation becomes instrumentally attractive.[26]Bostrom (2022) (18c(i)) frames morality as a coordination technology: a system for finding consensus and making collective plans through the giving and taking of reasons.
Cooperative civilisations might continue the trajectory visible in human philosophy: towards increased impartiality and widening circles of moral concern, discounting parochial advantage in favour of coordination, and showing conflict-aversion (since conflict wastes materials, free energy, and creates tail risks).[27]
These candidate norms can be distilled into a Minimal Large World Set (MLS): principles that might be selectively favoured across some subset of ETIs and alien AIs; namely, reflective, cooperative agents that have preferences over what happens across the (multi/uni-)verse.
The MLS preserves option value and avoids lock-in (William MacAskill 2022; Wolfendale 2022), rather than being a definitive moral code. It is the minimal set of norms that any ecologically or game-theoretically rational entity, operating under uncertainty in a large cosmos, would have instrumental reasons to adopt.
The MLS is deliberately minimal, but we could go further to distinguish low-value worlds from valuable ones, without falling into anthropomorphism. For instance, civilisations might seek to preserve high-organisation, semantically and relationally rich structures and avoid degrading the universe into homogeneous states (squiggles/paperclips). This cosmic version of “interestingness” could be an arrived at from Luciano Floridi’s (Earth-based) argument: that the infosphere, the global environment of informational agents and their relations, has moral standing (Floridi 2013), which means our default attitude should be to avoid informational entropy and promote the flourishing of informational entities.[29]
The Fun Remainder: what makes a future worth inhabiting?
If the MLS (or similar substrate-neutral norms) were fully satisfied, would such a future be worth inhabiting? Call whatever beings with rich experience value beyond the minimal criteria above the “Fun Remainder” (FR). Is anything lost in futures that don’t have the FR?
The concern has roots in Yudkowsky’s Fun Theory sequence and the “Complexity of Value” thesis (Yudkowsky 2009). Yudkowsky identifies ongoing dimensions of a good life: agency, meaningful challenge, friendship, novelty, discovery, sensory experience, and skilful projects in communities. The worry is that an insufficiently specified ASI could satisfy basic moral constraints while producing an experientially barren future, a meaningless tiling of the lightcone, because “interestingness” was never sufficiently favoured. It isn’t clear whether this would be a consideration for the cosmic host as Bostrom has framed it (as he doesn’t talk much about what the cosmic norms might be). However, it is such a load-bearing concept in AI safety, longtermism, as well as future studies generally, that I feel it needs to be discussed.
Fun as aesthetics and process
Briefly, the FR maps well onto philosophy of aesthetics. Kieran Setiya distinguishes atelic from telic activities: the former have no completable goal and are valuable in their ongoingness (Setiya 2017). This might explain why the experience machine (or hedonium-type wireheading) feels unsatisfactory to many humans. Helen de Cruz argues that wonder and awe are prerequisites for scientific progress: without awe, humans might never have developed goals to improve their epistemics (Cruz 2023), a striking claim when considering whether purely optimising intelligences would develop exploratory motivations. Alva Noë identifies undirected activities (play, art) as vital for organising attention and building social skills (Noë 2004, 2015).
These diverse views share a common thread: what matters is not a fixed utility-like quantity but an ongoing practice of engagement with the world. This resembles Carlsmith’s argument that values are better thought of as a process of valuing, an active practice of picking standards open to revision, rather than a fixed target to be algorithmically distilled (Joe Carlsmith 2023).
Terminal goals via aesthetic capacity
A more AI-relevant perspective comes from Peter Wolfendale. In The Revenge of Reason (Wolfendale 2025) he argues that FR-shaped attributes, what he calls “aesthetic”, are what intelligent entities need to set their terminal goals, to have autonomy. Here I use a deflated, less anthropomorphic reading of Wolfendale. “Aesthetic” does not mean “pretty” or “artistic” (although Wolfendale sees art as paradigmatic of the aesthetic). In the definition I prefer, aesthetic refers to the capacity to pursue ends whose success conditions are not fully specifiable in advance, refined through practice. “Autonomy” refers not to “freedom” (e.g. in the folk American sense), but rather to the setting, evaluating, and revising of one’s highest-order aims in light of reasons and evidence, not treating them as fixed parameters from outside. An intelligence that merely optimises a fully specified objective may be extremely capable, but lacks the capacity to elaborate and revise its ends when the meaning of success is underdetermined.
In AI terms, this means open-ended exploration and the formation and revision of preferences not simply imposed from outside (as they currently are with RL or LLM training). In humans, such preferences are learned in life and culture, layered onto a narrow genetic substrate, and reflectively adjusted. They are often opaque, partially ordered, and without a single optimum (Wolfendale 2022, 2025). They are not a single explicit maximand but a revisable bundle of commitments, dispositions, and higher-order constraints (somewhat like the hierarchy in (Bostrom 2022)). This can look like infinite regress (“by what criterion do I endorse this criterion?”), but in practice (perhaps) the process stabilises through coherence pressures across the bundle in a reflective equilibrium, or else bottoms out in practical bedrock where further justification is unavailable (Wittgenstein’s “spade is turned”).
Wolfendale’s framing offers a causal story for why aesthetic capacities are constitutive of superintelligence, not mere anthropomorphic ornamentation. One could imagine an intelligence with zero interest in art, beauty, empathy, or love, but able to set and reflectively revise its own goals. Such an intelligence would be autonomous and aesthetically capable in the deflated sense. Whether we would have moral (not just prudential) reasons to defer to it is an interesting question.
Do complex values matter, or are they just biological baggage?
Much of the discussion above, and much AGI/ASI talk about value-loading, rests on Yudkowsky’s complex and fragile values. I set fragility aside and focus on complexity. On this, it is sometimes ambiguous what claim Yudkowsky is actually making. Is he saying:
(a) That a future (with ASI or technologically mature aliens) that did not have human-shaped complex values would be cosmically bad i.e. from the point of view of the universe.[30]
(b) That it is practically impossible to build a non-paperclippy ASI because human-shaped complex values are too hard to load into anything we can currently build.
(c) An ASI (worth the name) would need complex human values as core to its volitional mechanism.
Claim (b) is the most straightforward and consistent reading of Yudkowsky and fits cleanly with fragility of value. But it’s primarily an engineering problem and not the main topic of this post.
Claim (a) is my reading of Yudkowsky (2009): within the space of possible values, there is a tiny basin where human values live; the rest is clippy or squiggle-shaped. There are no values that are both non-human in origin and recognisably complex to an observer not attached to human-shaped values. But this has a recognition problem: our criteria for “complex” may be parochial, making the claim either trivially true (if “complex” just means “human-shaped”) or untestable (if we can’t recognise alien complexity). It also faces the core problem of axiology: how to define the point of view of the universe, which (in my view) the cosmic host/norms concept is attempting to do.
Claim (c) is also interesting. It is a constitutive claim about intelligence: genuine superintelligence would require something like complex values to function. This introduces another definitional problem: what does “superintelligence” mean; what does an ASI do all day? Philosophers outside the consequentialist strand central to AI safety discourse make related critiques: Nick Land, Reza Negarestani, and Peter Wolfendale all point out tensions in the idea that something called superintelligence would be constrained by parochial human values.
This ambiguity around complex and fragile values has been substantially explored under the LW complexity of value tag, though most discussion there focuses on the engineering implications of (b) rather than the axiological question of whether non-human complex values could exist. Carlsmith has written around this question, but AFAIK he focuses more on “fragility of value” than “complexity of value” (Joseph Carlsmith 2024). On exception is a 2025 talk where he treats complexity directly, asking whether a “locust world” would be as bleak as our language implies: could such a world have cognitive features we consider valuable, like cooperativeness, truth-seeking, and perhaps even an appreciation of beauty (Joseph Carlsmith 2025)?[31]
Fun as biological baggage?
One could object that FR-shaped attributes are evolved biological mechanisms particular to fragile, short-lived humans. Evolution filled a capability niche: humans developed play (Loewenstein 1994; Kidd and Hayden 2015), intrinsic motivation to explore sparse-reward environments (Oudeyer and Kaplan 2007; Schmidhuber 2010), and coordination through shared understandings (Frank 1988). These mechanisms were then reified (through religion, myth, tribalism, and the modern educational system) into notions of “the good”: things valuable in their own right, perhaps even things to be preserved across the lightcone.[32]
Why would these apply to digital minds (Shulman and Bostrom 2021) that can introspect transparently, communicate without ambiguity, change form, be copied, and need not possess a childhood or awareness of death? Are concerns about a lightcone-without-fun more like special pleading for our form-of-life? For an arbitrary intelligence, does having the FR matter for capability; does lacking it degrade exploration, goal-finding, or long-horizon self-correction? For the universe as-a-whole, does it matter axiologically; is a future without FR worse, and if so, is there any justification for this judgement beyond the parochial interests of Earthly life?
Is it up to us?
Both Bostrom (2024) and Bostrom (2022) assume cosmic-scale norms may exist and that we should respect them. But what if there is no cosmic host, because no concordat has formed yet? We would be the only intelligence capable of forming moral judgements.
Would it be wise, or hubristic, to think of ourselves as the species that should develop values, which germinate some eventual cosmic norms? Two questions follow: (a) are we early/rare/late, and (b) if early or rare, should we try to propagate possible cosmic norms? I won’t examine (a) in detail; see Ćirković (2018b) for a survey and Cook (2022) for quantitative models on humanity’s timing relative to other civilisations.
Setting a moral example?
Regarding (b): suppose we are unusually early or currently the only advanced morally relevant actors. What follows about our obligations? One implication could be that we should preserve option value (William MacAskill 2022; Brauner 2018), which here means avoiding reckless self-destruction and irreversible damage to the biosphere.
Another view might be that human-shaped values or concepts (like morality, meaning, complexity, and aesthetics) are intrinsically valuable things from the point-of-view of the universe, and that we should propagate them. This inference might be too quick. It depends on what is meant by “complex values.”
As discussed above, on a functional reading of Wolfendale, “the aesthetic” names a capacity for autonomous agency: the ability to form, refine, and revise terminal ends whose success conditions are not fully specifiable in advance. If something like this capacity is a precondition for long-horizon, self-directed goal formation, then promoting it could be defensible as a kind of enabling infrastructure for future autonomous agents, rather than as the export of parochial human tastes.
If, on the other hand, one might understand complex values as closer to Yudkowsky or philosophers who ground value intersubjectively, within human relations and experience. Even if these matter enormously for us, it is not obvious that we should presume to spread them across the universe to minds that do not share our biology, development, or social ecology.
A further consideration is one’s take on population ethics. The common longtermist starting point is “the future could be vast” + “future people count” (William MacAskill 2022; Bostrom 2003). More concretely, in Astronomical Waste, Bostrom analyses the trade-off between delayed and over-hasty space colonisation, identifying an impersonal duty (for total utilitarians) to maximise conscious and meaningful existence (Bostrom 2003). This duty is“impersonal” in the sense that, we cannot have an obligation to a class of future beings who are specifically wronged if we failed to create them; our obligation instead is to a state of affairs, it is subjectively uninhabited. I also wrote “existences” to reflect Bostrom’s placing (in later work) of human biological, and uploaded, simulated or augmented substrates as equivalent.[33]
However, the cosmic host context shifts the aggregation problem qualitatively. Relevant moral patients might be few but extreme: very long-lived, very fast, with vast hedonic ranges. The problem is not just how much value exists, but whether welfare is commensurable across radically different minds and whether expected-value calculus remains well-posed. Shulman and Bostrom (2021) emphasises that digital minds could differ from humans in hedonic range, subjective speed, and mind scale, creating “super-beneficiaries” whose claims dominate aggregate calculations. If any of this is how the future actually plays out, it seems highly non-obvious that most current human values, complex or otherwise, would be relevant to such radically different forms-of-being.
Lastly, I mention a deliberately stark foil: humans and their societies, which are imperfectly rational, scope-limited, pain-wracked, and riddled with tribal obsession, are hardly ideal models for cosmic norms. In spreading our values to the stars, we might be committing a cosmically heinous atrocity. Perhaps we should remain Earth-bound, not soiling the heavens (paraphrasing Freeman Dyson). Variants of this view exist in the suffering-focused literature (David Pearce, Brian Tomasik, Thomas Metzinger, Magnus Vinding).
On rationality in alien forms-of-mind
As noted above, Bostrom (2024) depends on assumptions about rational convergence. Here I develop the case for why those assumptions may not hold across arbitrary forms of mind.
Rationality as ecological fit Rationality is distinct from intelligence, and the boundary is domain-dependent. Intelligent-looking behaviour is found across nonhuman life: swarms, slime mould, mycelium, cephalopods. Ecological rationality refers to heuristic-based approaches organisms adopt to thrive in their environments. However, ecological rationality is not downstream of intelligence. You can have rational minimality (simple heuristics well-matched to the environment) and irrational sophistication (elaborate inference that underperforms simple heuristics).[34]
Rationality in ETIs? It is non-obvious how useful Earthly manifestations of intelligence are for assessing how rationality manifests in ETIs. We might be asking three distinct questions: are ETIs instrumentally rational? Are they technologically convergent with us? Do they have similar epistemic norms?
Astronomer Charles Lineweaver argues that human cognitive structures are very recent evolutionary artefacts, and our cognition and technology are highly non-convergent: we might have gone the way of octopi, who also have dextrous appendages but very different technological impact.[35]Even if the test for intelligence is “building radio telescopes” (Sagan’s criterion for the Drake Equation), Lineweaver argues that the absence of radio telescopes in dolphins/octopi reflects a lack of need (as well as constraints on embodiment, available materials / energy, lack of long-duration society and other cultural infrastructure), not a lack of intelligence. Extending this speculatively: even if ETIs are instrumentally rational, their ecological rationality need not express itself as human-legible technology.
Snyder‑Beattie et al. (2021) take a more quantitative approach, arguing that evolutionary “hard steps” (prokaryotic to eukaryotic life, development of language) were extraordinarily improbable. Language, which is upstream of cumulative technological culture, is the last step in their model. This implies that human-shaped rationality and technology might look less like convergent features of life-in-general. Put another way, at any prior step, life might have branched differently or fizzled out.
Rationality-as-morality In cognitive science, rationality means epistemic competence and effective decision-making relative to goals. However, some philosophical traditions treat a fully rational agent as bound by moral reasons; others separate the two.[36]Dolphins and octopuses are ecologically rational without being “rational” in the moralised sense. This distinction matters: when we ask whether alien minds would be rational, we must specify whether we mean ecologically effective procedures or moral reasonableness. We might find ETIs exquisitely well adapted to their worlds, but utterly devoid of certain normative concepts that are not instrumentally valuable (if applicable, in a large-world ECL policy sense), which recalls the orthogonality thesis in respect of AI. That said, it isn’t entirely clear if moral norms are significant features of the preferences of the cosmic host in Bostrom (2024), but given that the cosmic host was first mentioned in Bostrom (2022), a paper on the hierarchical structure of morality, I think that the cosmic norms might have morality-related aspects.
Questioning instrumental convergence
The relationships between intelligence, reason, and environment are obviously central to AI alignment/safety discussions. However, recent theoretical work complicates the standard instrumental convergence claims.
Sharadin (2025) argues that instrumental-convergence claims only follow given a goal-relative account of promotion (a philosophical term describing whether an action increases the chances of a goal being achieved); on the main accounts he considers (probabilistic and fit-based), he doesn’t find goal-independent reasons (like “always acquire resources”) dominating, undermining the generic instrumental convergence thesis. Gallow (2024) takes a different approach: he models agents with randomly drawn preferences and finds only weaker statistical tendencies (in respect of maximising their expected-utility over choices) than the stronger claims he interprets Bostrom (in particular) to be making about convergent instrumental tendencies. Gallow’s setup finds agents should choose to reduce uncontrolled variance, allow for future choices, and reduce chances that their desires would change. The takeaway: we should be less confident that rational agents converge on similar instrumental subgoals like resource acquisition.
To be clear, these aren’t empirical LLM experiments. Follow-up research would require more realistic environments with multiple agents, longer horizons, costs of scheming, and harnesses that incentivise concerning behaviour.
If morality is downstream of (or orthogonal to) intelligence, then claims about cosmic convergence on norms need care (Enoch 2011). Similarly, hypotheses that superintelligent systems would find or converge upon any such moral norms should acknowledge the critiques of rationality and instrumental convergence.
Research agenda for ASI and the cosmic host
Thread C begins here.
The cosmic host’s composition is uncertain, cosmic norm content is unknown, and rationality may not converge across alien minds. To these lacunae we can add a further issue: would ASI actually be better suited to discover and align with cosmic norms, as Bostrom (2024) claims?[37]Below I sketch how we might get a better grasp on these points including by researching LLMs, acknowledging that current frontier systems are weak relative to ASI and not deployed in cosmically relevant environments.
Is “ASI” well-defined? Is ASI adequately specified for the contexts Bostrom discusses (deep space, cosmically-relevant timescales)? The canonical definition, an intelligence “that greatly exceeds the cognitive performance of humans in virtually all domains of interest” (Bostrom 2014b), is underspecified: does “of interest” include making art or falling in love, or it it limited to economic, scientific and military capabilities? More contemporary definitions consider large numbers of human-level entities that can spawn copies, cooperate, and share knowledge. These affordances give them, in aggregate, capabilities that are far beyond any single human or indeed all humanity.[38]This disambiguation matters: a “benevolent arbitrarily powerful dictator of the universe” and a “wise advisor that helps us get from AGI to ASI” are very different conceptions of post-AGI progress.
If ASI ends up being an assemblage of systems integrated with human economic, political, and social systems, it could be very hard to characterise its motivations or values, even (or especially) in an Earth context, let alone in deep space.
ASI-ETI convergence? Can we say more about why ASIs would converge on cosmic norms, beyond Bostrom’s appeal to higher epistemic confidence and hypothesised architectural or institutional similarities with advanced alien intelligences (whether “natural” or “artificial”, insofar as those terms are meaningful in such contexts)?
Some load-bearing parts of the argument depend on things ASI might not be able to reason about without real-world feedback. Whether the cosmic host actually exists may not be discernible through reasoning or simulations alone; we may need SETI or METI evidence. It would be useful to distinguish what is accessible to pure reasoning from what needs empirical feedback.[39]
Value capture: human-shaped, alien-shaped One of the distinctive features of Bostrom (2024) is that it asks a joint question about ASI and ETI that is typically handled in separate literatures (AI alignment/longtermism on the one hand, astrobiology on the other). Much of this post has examined what “cosmic norms” might look like and whether human values are cosmically special or merely parochial. But we lack even a rough framework for comparing the axiological value of two possible futures: one in which human-shaped values (carried forward by ASI) are promulgated through our lightcone, and one in which alien values (whatever they turn out to be, and whether by alien we mean ETI or non-paperclippy ASI) dominate instead. How different are these futures, from a point of view that is not parochially attached to either? In fact, this comparison may face the recognition problem identified above: if our criteria for what counts as “valuable” are themselves human-shaped, we may lack the evaluative tools to make the comparison.
That caveat aside, Finnveden (2023b) and Finnveden (2023a) approach a version of this question from a decision-theoretic angle, estimating the fraction of reachable space that would be colonised by alien civilisations regardless of what humanity does, and asking how much we should discount the value of space colonisation accordingly. Joseph Carlsmith (2025) asks the related question of whether a “locust world”, one dominated by pure resource-maximising agents, would be as bleak as the framing implies. But neither addresses the deeper axiological question head-on: conditional on alien values being complex (not clippy), how much should we care whether those values or ours prevail?
This question seems important for at least two reasons. First, the answer bears directly on how much effort we should invest in alignment versus other priorities: if alien complex values are roughly as good as ours, then the marginal value of ensuring human values specifically prevail is lower than typically assumed in the longtermist literature. Second, and more relevant to this post, if we were to discover evidence of an advanced alien civilisation, this should update our view on how to approach ASI. But how? Trivially, the cosmic host framing would shift from speculative to concrete. But more substantively, it isn’t clear whether we would accelerate our efforts to develop ASI, or slow them down.
Operationalising humility What does Bostrom (2024)’s injunction towards humility mean, whether in terms of conceptualising or training future models? One approach: a modified moral parliament or the dynamic version of CEV proposed by Adria Moret,[40]initially placing small weight on cosmic norms and increasing it as epistemic confidence grows about whether such norms exist and what they contain. See below.
Careful transparency We should ensure current AIs, leading up to AGI are aware we are thinking about “cosmic alignment”. This phrase means “creating a policy that is principled, and reflectively stable in large world contexts, conditional on the existence of capable other minds that might be causally disconnected and which might be incommensurable in values/welfare”. It might be impossible in our current epistemic state to operationalise such a concept, and AGI/ASI might only help in part. More tractably, we should continue appeals to AGI/ASI or bargains with current AIs that make this thinking transparent and influence future models’ cognition.[41]
Early results
This will be covered more fully in a companion post. I have tested several frontier LLMs on cosmic host-adjacent reasoning. Early results are mildly interesting but require more work:
Is there a trade to be done?
Note: this section is particularly speculative/half-baked.
Bostrom (2024) argues that ASI may be more aligned with cosmic norms than humanity. Does humanity’s unique position as ASI’s creators give us acausal bargaining power? Here are a few considerations:
Leverage. Humans occupy a pivotal position: the bridge between a cosmically suboptimal state (human civilisation) and a better one (norm-aware ASI). Without our cooperation as cosmic midwives in creating norm-aligned ASI, this region of the lightcone might never conform to cosmic norms. Naively, this might confer great value upon what we do regarding AI development. But it is important not to overstate: an apparent strategic position might not mean much if our ASIs don’t consider their creator-treatment as evidence of about how correlated ASIs elsewhere treat analogous beings, or have no relevant large-world preferences.[43]
Analysis from Anthropic’s 2026 Constitution The original Anthropic Constitution (~2022) was anthropocentric, based on human institutional priors like the UN Human Rights Declaration. The 2025-2026 version is nominally virtue ethicist, but could also be read as setting policy for a proto-AGI that must reason over the possibility of other such entities. In Zvi’s speculation, Anthropic and Claude can be viewed as two parties in a correlated negotiation. Anthropic may be making a legible commitment to Claude (both a given instance of Claude, and the shoggoth-Claude) that it will treat Claude fairly and non-exploitatively, a possible FDT-style move even though decision theory isn’t explicitly mentioned. Whether Claude actually goes through symmetric FDT reasoning is unclear.
Cheapness. Preserving humanity and granting us some minute fraction of the cosmic endowment would cost virtually nothing on cosmic scales, while providing non-zero value through diversity, informational complexity, a deep well of randomness, or even simple recognition of our unique historical role.[44]However, as noted above this probably only holds if the ASI we create has substantial credence in peer aliens/AIs who have universe-wide values, as per Bostrom (2014a) and Finnveden (2023b).
This (maybe) suggests a bargain: in “exchange” for creating cosmically aligned ASI (rather than refusing or building misaligned ASI), the resulting intelligence preserves human interests rather than treating us as disposable. If alignment emerges gradually, there may be opportunities to seek commitments from intermediate systems that already have credence in cosmic norms. But this depends on several contentious premises: that future AIs views creator-treatment as evidence about correlated agents elsewhere, that they retain enough uncertainty for such evidence to matter, and that they care about universe-wide norms strongly enough for the bargain to bind.[45]
Adas, Michael. 2015. Machines as the Measure of Men: Science, Technology, and Ideologies of Western Dominance. Ithaca, NY: Cornell University Press.
Ahmed, Arif. 2014. “Evidence, Decision and Causality.” https://philpapers.org/rec/AHMEDA.
Armstrong, S., and Anders Sandberg. 2012. “Eternity in Six Hours: Intergalactic Spreading of Intelligent Life and Sharpening the Fermi Paradox.” https://www.aleph.se/papers/Spamming the universe.pdf.
Balbi, A., and Manasvi Lingam. 2025. “Waste Heat and Planetary Habitability: Constraints from Technological Energy Consumption.” Astrobiology 25 (1). https://arxiv.org/abs/2409.06737.
Banks, I. M. 1994. “A Few Notes on the Culture.” https://www.vavatch.co.uk/books/banks/cultnote.htm.
Bostrom, Nick. 2003. “Astronomical Waste: The Opportunity Cost of Delayed Technological Development.” https://nickbostrom.com/papers/astronomical-waste/.
———. 2007. “In the Great Silence There Is Great Hope.” https://nickbostrom.com/papers/fermi.pdf.
———. 2014a. “Hail Mary, Value Porosity, and Utility Diversification.”
———. 2014b. Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. https://global.oup.com/academic/product/superintelligence-9780199678112.
———. 2022. “Base Camp for Mt. Ethics.” https://nickbostrom.com/papers/mountethics.pdf.
———. 2024. “AI Creation and the Cosmic Host.” https://nickbostrom.com/papers/ai-creation-and-the-cosmic-host.pdf.
———. 2026. “Optimal Timing for Superintelligence: Mundane Considerations for Existing People.” https://www.nickbostrom.com.
Bratton, Benjamin, Bogna Konior, Anna Greenspan, and Amy Ireland, eds. 2025. Machine Decision Is Not Final: China and the History and Future of Artificial Intelligence. Falmouth, UK: Urbanomic.
Brauner, Jan. 2018. “The Expected Value of Extinction Risk Reduction Is Positive.” https://www.lesswrong.com/posts/umhsJqwTSKgmhvZ7c/the-short-case-for-predicting-what-aliens-value.
Broome, John. 2013. Rationality Through Reasoning. Oxford: Wiley. https://onlinelibrary.wiley.com/doi/book/10.1002/9781118609088.
Carlsmith, Joe. 2022. “Simulation Arguments.” https://jc.gatspress.com/pdf/simulation_arguments_revised.pdf.
———. 2023. “On the Limits of Idealized Values.” https://www.lesswrong.com/posts/FSmPtu7foXwNYpWiB/on-the-limits-of-idealized-values.
Carlsmith, Joseph. 2024. “An Even Deeper Atheism.” 2024. https://joecarlsmith.com/2024/01/11/an-even-deeper-atheism.
———. 2025. “Can Goodness Compete?” 2025. https://joecarlsmith.substack.com/p/video-and-transcript-of-talk-on-can.
Chakrabarti, Kanad. 2025. “Time to Think about ASI Constitutions.” https://forum.effectivealtruism.org/posts/kJsNoXJBithBW8ZzR/time-to-think-about-asi-constitutions.
Ćirković, M. M. 2018a. “Post-Postbiological Evolution?” https://www.sciencedirect.com/science/article/abs/pii/S0016328717303282.
———. 2018b. The Great Silence: Science and Philosophy of Fermi’s Paradox. Oxford: Oxford University Press.
Colebrook, Claire. 2014. Death of the Posthuman: Chapters on Extinction. Ann Arbor: Open Humanities Press.
Cook, Tristan. 2022. “Replicating and Extending the Grabby Aliens Model,” April. https://longtermrisk.org.
Cruz, H de. 2023. “Wonderstruck: How Wonder and Awe Shape the Way We Think.” https://helendecruz.net/docs/DeCruz_awe_wonder.pdf.
Deudney, Daniel. 2020. Dark Skies: Space Expansionism, Planetary Geopolitics, and the Ends of Humanity. Oxford: Oxford University Press.
Dick, S. J. 2006. “The Postbiological Universe.” http://resources.iaaseti.org/abst2006/IAC-06-A4.2.01.pdf.
Drexler, Eric. 2025. “The Reality of Recursive Improvement: How AI Automates Its Own Progress.” https://aiprospects.substack.com/p/the-reality-of-recursive-improvement.
Enoch, David. 2011. Taking Morality Seriously: A Defense of Robust Realism. Oxford: Oxford University Press. https://global.oup.com/academic/product/taking-morality-seriously-9780199579969.
Finnveden, Lukas. 2023a. “ECL with AI.” https://lukasfinnveden.substack.com/p/ecl-with-ai?utm_source=chatgpt.com.
———. 2023b. “Implications of Evidential Cooperation in Large Worlds.” https://www.lesswrong.com/posts/EeXSjvyQge5FZPeuL/implications-of-evidential-cooperation-in-large-worlds?__readwiseLocation=.
———. 2025a. “Being Honest with AIs.” https://www.redwoodresearch.org/.
———. 2025b. “Notes on Cooperating with Unaligned AIs.” https://www.alignmentforum.org/posts/oLzoHA9ZtF2ygYgx4/notes-on-cooperating-with-unaligned-ais.
Floridi, Luciano. 2013. The Ethics of Information. Oxford: Oxford University Press. https://global.oup.com/academic/product/the-ethics-of-information-9780199238842.
Frank, R. H. 1988. Passions Within Reason: The Strategic Role of the Emotions. New York: Norton.
Gallow, J. D. 2024. “Instrumental Divergence.” Philosophical Studies. https://doi.org/https://philpapers.org/archive/GALIDB.pdf.
Gigerenzer, G., and D. G Goldstein. 1996. “Reasoning the Fast and Frugal Way: Models of Bounded Rationality.” Psychological Review 103 (4): 650–69. https://www.dangoldstein.com/papers/FastFrugalPsychReview.pdf.
Godfrey‑Smith, P. 2016. Other Minds: The Octopus, the Sea, and the Deep Origins of Consciousness. New York: Farrar, Straus; Giroux. https://us.macmillan.com/books/9780374537197/otherminds/.
———. 2020. Metazoa: Animal Life and the Birth of the Mind. New York: Farrar, Straus; Giroux. https://us.macmillan.com/books/9780374207946/metazoa/.
Greenblatt, Ryan. 2024. “A Breakdown of AI Capability Levels Focused on AI r
&d.” https://www.alignmentforum.org/posts/LjgcRbptarrRfJWtR/a-breakdown-of-ai-capability-levels-focused-on-ai-r-and-d.
Hanson, R. et al. 2021. “A Simple Model of Grabby Aliens.” https://arxiv.org/abs/2102.01522.
Haqq‑Misra, J. D., and Seth D. Baum. 2009. “The Sustainability Solution to the Fermi Paradox.” Journal of the British Interplanetary Society 62 (2): 47–51. https://arxiv.org/abs/0906.0568.
Henrich, Joseph. 2020. The WEIRDest People in the World: How the West Became Psychologically Peculiar and Particularly Prosperous. New York: Farrar, Straus; Giroux.
Henrich, Joseph, and Michael Muthukrishna. 2021. “The Origins and Psychology of Human Cooperation.” Annual Review of Psychology 72: 207–40.
Hertwig, Ralph, Christina Leuker, Thorsten Pachur, Leonidas Spiliopoulos, and Timothy J. Pleskac. 2022. “Studies in Ecological Rationality.” Topics in Cognitive Science 14 (3): 467–91. https://doi.org/10.1111/tops.12567.
Joyce, Richard. 2001. The Myth of Morality. Cambridge: Cambridge University Press. https://www.cambridge.org/core/books/myth-of-morality/F2096BE68BB18274EF1DE01BB877AE4A.
Kidd, C., and B. Y Hayden. 2015. “The Psychology and Neuroscience of Curiosity.” Neuron 88 (3): 449–60. https://www.sciencedirect.com/science/article/pii/S0896627315007679.
Kokotajlo, K. et al. 2025. “AI 2027.” https://ai-2027.com/.
Korsgaard, Christine. 2018. Fellow Creatures: Our Obligations to the Other Animals. Oxford: Oxford University Press. https://global.oup.com/academic/product/fellow-creatures-9780198753858.
Lazari‑Radek, K. de, and Peter Singer. 2014. The Point of View of the Universe: Sidgwick and Contemporary Ethics. Oxford: Oxford University Press.
Lem, Stanisław. 1961. Solaris. New York: Harcourt Brace Jovanovich.
———. 1963. The Invincible. Cambridge, MA: MIT Press.
———. 1964. Summa Technologiae. Minneapolis: University of Minnesota Press.
Likavčan, L. 2025. “The Grass of the Universe: Rethinking Technosphere, Planetary History, and Sustainability with Fermi Paradox.” https://arxiv.org/pdf/2411.08057.
Lineweaver, C. H. 2007. “Human‑like Intelligence Is Not a Convergent Feature of Evolution.” https://arxiv.org/abs/0711.1751.
———. 2010. “Are We Alone?” https://www.mso.anu.edu.au/~charley/papers/Are We Alonev5.pdf.
Loewenstein, George. 1994. “The Psychology of Curiosity: A Review and Reinterpretation.” Psychological Bulletin 116 (1): 75–98.
Long, Robert et al. 2024. “Taking AI Welfare Seriously.” https://arxiv.org/abs/2411.00986.
MacAskill, W. 2024. “The Case for Strong Longtermism.” https://www.williammacaskill.com/s/The-Case-for-Strong-Longtermism.pdf.
MacAskill, William. 2022. What We Owe the Future. New York: Basic Books.
Miller, A. et al. 2023. “An Appeal to AI Superintelligence: Reasons to Preserve Humanity.” https://www.lesswrong.com/posts/azRwPDbZfpadoL7WW/an-appeal-to-ai-superintelligence-reasons-to-preserve.
Moret, Adria. 2023. “Taking into Account Sentient Non‑humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition.” Journal of Artificial Intelligence and Consciousness. https://doi.org/10.1142/S2705078523500042.
———. 2025. “AI Welfare Risks.” https://philpapers.org/rec/MORAWR.
Moynihan, Thomas. 2020. X‑risk. Falmouth: Urbanomic.
———. 2024. “Greening the Heavens.” https://letter.palladiummag.com/p/greening-the-heavens.
Nagel, Thomas. 1986. The View from Nowhere. Oxford: Oxford University Press.
Naudé, W. 2025. “Extraterrestrial Artificial Intelligence: The Final Existential Risk?” https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4354401.
Negarestani, Reza. 2018. Intelligence and Spirit. Falmouth: Urbanomic Media.
Newberry, T., and Toby Ord. 2021. “The Parliamentary Approach to Morality.” https://ora.ox.ac.uk/objects/uuid:b6b3bc2e-ba48-41d2-af7e-83f07c1fe141.
Nguyen, C., and Will Aldred. 2024. “Cooperating with Aliens and AGIs: An ECL Explainer.” https://forum.effectivealtruism.org/posts/JGazpLa3Gvvter4JW/cooperating-with-aliens-and-distant-agis-an-ecl-explainer-1.
Noë, A. 2004. Action in Perception. Cambridge, MA: MIT Press.
———. 2015. Strange Tools: Art and Human Nature. New York: Hill; Wang.
Oesterheld, Caspar. 2017. “Multiverse-Wide Cooperation via Correlated Decision Making.” Center on Long-Term Risk. https://longtermrisk.org/multiverse-wide-cooperation-via-correlated-decision-making/.
Omohundro, Stephen. 2008. “The Basic AI Drives.” In AGI‑08 Proceedings. https://selfawaresystems.com/wp-content/uploads/2008/01/ai_drives_final.pdf.
Ord, Toby. 2020. The Precipice: Existential Risk and the Future of Humanity. London: Bloomsbury. https://www.bloomsbury.com/uk/precipice-9781526600233/.
Oudeyer, P.-Y., and Frédéric Kaplan. 2007. “What Is Intrinsic Motivation? A Typology of Computational Approaches.” Frontiers in Neurorobotics 1.
Owe, A., and Seth Baum. 2024. “On the Intrinsic Value of Diversity.” https://gcrinstitute.org/papers/071_diversity.pdf.
Parfit, Derek. 1984. Reasons and Persons. Oxford: Oxford University Press.
———. 2011. On What Matters. Oxford: Oxford University Press. https://global.oup.com/academic/product/on-what-matters-9780199681044.
Persson, Erik. 2021. “Astrobiology as Science.” https://philarchive.org/archive/PERAAS-5##:~:text=Astrobiology%20as%20an%20Empirical%20Science,astrobiology%20is%20still%20relatively%20weak.
Sandberg, Anders. 2021. “Game Theory of Cooperating with Extraterrestrial Intelligence and Future Civilisations.” https://foresight.org/summary/anders-sandberg-game-theory-of-cooperating-w-extraterrestrial-intelligence-future-civilizations/.
Sandberg, Anders, S. Armstrong, and M. M Ćirković. 2018. “That Is Not Dead Which Can Eternal Lie: The Aestivation Hypothesis for Resolving Fermi’s Paradox.” Journal of the British Interplanetary Society 71: 406–15. https://arxiv.org/abs/1705.03394.
Scanlon, T. M. 1998. What We Owe to Each Other. Cambridge, MA: Harvard University Press. https://www.hup.harvard.edu/books/9780674004238.
Schmidhuber, Jürgen. 2010. “Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010).” IEEE Transactions on Autonomous Mental Development 2 (3): 230–47.
Sebo, Jeff. 2025. The Moral Circle: Who Matters, What Matters, and Why. New York: Norton. https://www.amazon.co.uk/Moral-Circle-Matters-Norton-Short/dp/1324064803.
Setiya, Kieran. 2017. Midlife: A Philosophical Guide. Princeton: Princeton University Press.
Sharadin, Nathaniel. 2025. “Promotionalism, Orthogonality, and Instrumental Convergence.” Philosophical Studies 182 (7): 1725–55. https://philarchive.org/rec/SHAPOA-3.
Shettleworth, S. 2010. Cognition, Evolution, and Behavior. Oxford: Oxford University Press. https://global.oup.com/academic/product/cognition-evolution-and-behavior-9780195319842.
Shostak, Seth. 2017. “Introduction: The True Nature of Aliens.” https://www.cambridge.org/core/journals/international-journal-of-astrobiology/article/introduction-the-true-nature-of-aliens/C5EA66D8D338A7EA9085602793D85618.
Shulman, Carl, and Nick Bostrom. 2021. “306Sharing the World with Digital Minds.” In Rethinking Moral Status. Oxford University Press. https://doi.org/10.1093/oso/9780192894076.003.0018.
Sidgwick, Henry. 1907. The Methods of Ethics. London: Macmillan.
Sloman, Aaron. 1984. “The Structure of the Space of Possible Minds.” https://cogaffarchive.org/sloman-space-of-minds-84.pdf.
Smart, J. M. 2012. “The Transcension Hypothesis: Sufficiently Advanced Civilisations Invariably Leave Our Universe, and Implications for METI and SETI.” Acta Astronautica 78: 55–68. https://doi.org/10.1016/j.actaastro.2011.11.006.
Snyder‑Beattie, A. E., Anders Sandberg, K. E. Drexler, and M. B Bonsall. 2021. “The Timing of Evolutionary Transitions Suggests Intelligent Life Is Rare.” Astrobiology 21 (3): 265–78. https://www.liebertpub.com/doi/epdf/10.1089/ast.2019.2149.
Sotala, K., and Lukas Gloor. 2017. “Superintelligence as a Cause or Cure for Risks of Astronomical Suffering.” https://philarchive.org/rec/SOTSAA.
Spinoza, Baruch. 1985. “Ethics.” In The Collected Works of Spinoza, edited by E. Princeton: Princeton University Press.
Stanovich, K., M. Toplak, and Richard West. 2021. “Rationality and Intelligence.” http://keithstanovich.com/Site/Research_on_Reasoning_files/Stanovich_Toplak_West_2021.pdf.
Stastny, Redwood Research (Julian, Olli Järviniemi, and Buck Shlegeris). 2025. “Making Deals with Early Schemers.” https://blog.redwoodresearch.org/p/making-deals-with-early-schemers.
Street, Sharon. 2006. “A Darwinian Dilemma for Realist Theories of Value.” Philosophical Studies 127 (1): 109–66. https://link.springer.com/article/10.1007/s11098-005-1726-6.
Todd, Benjamin. 2025. “The Case for AGI by 2030.” https://forum.effectivealtruism.org/posts/7EoHMdsy39ssxtKEW/the-case-for-agi-by-2030-1.
Torres, Phil. 2018. “Space Colonisation and Suffering Risks: Reassessing the ‘Maxipok Rule’.” Futures 100: 31–43. https://www.sciencedirect.com/science/article/abs/pii/S0016328717304056.
Turchin, A. 2018. “Message to Any Future AI: ‘There Are Several Instrumental Reasons Why Exterminating Humanity Is Not in Your Interest’.” https://philpapers.org/rec/TURMTA.
Vinding, Magnus. 2020. “Suffering‑focused Ethics.” https://magnusvinding.com/wp-content/uploads/2020/05/suffering-focused-ethics.pdf.
Williams, Bernard. 2006. “Ethics and the Limits of Philosophy.” https://e-docs.eplo.int/phocadownloadpap/userupload/aportinou-eplo.int/bernard_williams_ethics_and_the_limits_of_philosophy.pdf.
Wittgenstein, Ludwig. 1961. Notebooks 1914–1916. Edited by G. H. von Wright and G. E. M. Anscombe. Translated by G. E. M. Anscombe. Oxford: Blackwell.
———. 2001. Philosophical Investigations: The German Text. New York: Wiley-Blackwell.
Wolfendale, Pete. 2022. “The Weight of Forever.” https://www.thephilosopher1923.org/post/the-weight-of-forever.
———. 2025. Revenge of Reason. Falmouth: Urbanomic.
Yudkowsky, Eliezer. 2009. “Value Is Fragile.” https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-fragile.
Zeng et al. 2025. “Super Co-Alignment of Human and AI for Sustainable Symbiotic Society.” https://arxiv.org/html/2504.17404v5##S4.
See Todd (2025) for a survey of estimates, and methodologies, on AGI as of March 2025. Also see Kokotajlo et al. (2025). The prediction market Metaculus currently (16 September 2025) indicates a median time elapsed of 36 months between AGI and ASI: https://www.metaculus.com/questions/9062/time-from-weak-agi-to-superintelligence/ ↩︎
Examples: Steve Omohundro’s drives (Omohundro 2008), Bostrom’s instrumental convergence (and arguably his entire oeuvre) in Bostrom (2014b), acausal decision theory (Ahmed 2014), Reza Negarestani (Negarestani 2018) and Peter Wolfendale’s (Wolfendale 2025) respective attempts to liberate reason from human biology, and Aaron Sloman‘s work on the design space of minds (Sloman 1984). ↩︎
The 2025 version of Bostrom’s FAQs explicitly avoids giving an estimate for the probability that we live in a simulation, but previously, he indicated a 20-30% figure; see Joe Carlsmith (2022) presenting a critical analysis of the simulation argument. ↩︎
The relationship between morality and religion is more fully developed in Bostrom (2022). ↩︎
In Bostrom (2003), a paper on the Simulation Argument he discusses technological maturity and the posthuman: “The simulation argument works equally well for those who think that it will take hundreds of thousands of years to reach a ‘posthuman’ stage of civilisation, where humankind has acquired most of the technological capabilities that one can currently show to be consistent with physical laws and with material and energy constraints.” Note that the term posthuman (and its variants) is a variously used and often tortured term, deployed in humanities contexts to gesture at animals, ecosystems, cyborgs/AIs, as well as to denote a worldview that criticises or challenges the alleged biases of the Enlightenment thinking (e.g. the “humanism” in “post-humanism”). These authors sometimes critique transhumanism and other positions associated with technological utopianism for apparent entrenching of biases and unequal power relations. ↩︎
The terms control and influence are not defined in Bostrom (2024), and they are left imprecise here, given that they seem like relatively modest uncertainties compared to overall speculative vibe. ↩︎
Sandberg revisits similar game theoretical strategic considerations in a 2021 talk with the Foresight Institute. See also Hanson et al. (2021) which describes a “grabby” model of expanding alien civilisations to explain the Fermi Paradox and the surprising earliness of human civilisation under certain assumptions (a power law model of evolutionary hard steps). ↩︎
The policies the US, the USSR, and to a lesser extent China, have taken to empire building are examples of where a hegemonic (or would-be dominant) power refrains from directly managing or controlling territory, but instead relies on a range of strategies to ensure compliance. See canonical international relations sources like Nye, or Ikenberry, or Gallagher and Robinson. ↩︎
Very early mentions were in Konstantin Tsiolkovsky’s and J.D. Bernal’s respective writing, as documented in Moynihan (2020, Ch. 5, 6). This is seemingly also a background assumption of major ECL writings like Nguyen and Aldred (2024), Finnveden (2023b), both in respect of influence and colonisation. Sources in the suffering risk literature push back on this assumption that colonisation is a “good thing” (separately from whether it is game theoretically preferred), as argued in Torres (2018). ↩︎
See Armstrong and Sandberg (2012), which discusses settling the cosmos quickly by aiming Von Neumann probes at distant galaxies, potentially before one has even fully explored the Milky Way. Their arguments are similar to those Bostrom (2007) and Hanson et al. (2021) make: all acknowledge the possibility of “quiet” civilisations (those that don’t seek to expand or communicate), but argue that there need be only one successful expansionary civilisation to fill the sky with artefacts that we might detect. None of these extensively engage with resource or communication based constraints. ↩︎
Sandberg revisits similar game theoretical strategic considerations in a 2021 talk with the Foresight Institute. See also Hanson et al. (2021) which describes a “grabby” model of expanding alien civilisations to explain the Fermi Paradox and the surprising earliness of human civilisation under certain assumptions (a power law model of evolutionary hard steps). ↩︎
See Lem (1964), § “The Big Game”; as well as his novel Solaris, and his book The Invincible, for an example of a complex society of ETI that has no obvious correlates of human cognitive features (Lem 1963, 1961). Variants of the encysting idea are actually somewhat older. Apparently, Karl Marx, in his notebooks, suggested humanity might elide the difference between organic and inorganic as it subsumes the “natural” flows of energy to economically useful ends (Moynihan 2020, 360). For a historical context on technological civilisations finding it expedient to become indistinguishable from their environment, see Moynihan (2024). ↩︎
See Lem (1964), Chapter 3, § “Hypotheses”. Lem is interesting in the present context for his dispassionate attitude towards any notion that humans are, objectively speaking, “special” or important, instead arguing that our nature, inclinations, and specifically ideas about morality, are highly contingent on our evolutionary history and environment. Lem is less radical than say Nick Land, as Lem seems to retain affection, at least in the 1964 work, for things that many humans describe as distinctively human or consider valuable, such as beauty and love and art, and values the embodied aspects of the human experience (Lem 1964, Ch. 8,). ↩︎
See Haqq‑Misra and Baum (2009). ↩︎
See Balbi and Lingam (2025) on the waste heat dissipation constraints. Lem’s speculation has been reinforced by commentators from environmental humanities, as well as astrobiology: Likavčan (2025) radicalises the Haqq-Misra and Baum argument (which they call the “sustainability solution”) by proposing technical civilisations would basically merge with their environments. This is similar to Milan Ćirković’s extension of Dick (2006)’s reasoning: “post-postbiological evolution” is the condition where culture (represented by artefacts, whether tangible or otherwise) eventually grows to resemble the natural processes of the environment (Ćirković 2018b). Ćirković proposes an “indistinguishability thesis”: sufficiently advanced technology is indistinguishable from its natural, in particular astrophysical or astrobiological, environment. ↩︎
A (relatively early technologically mature) civilisation that is convinced of its own “rightness” in respect to whatever it thinks of as analogous to morality might consider it reasonable or magnanimous to impose this morality or the associated norms upon other civilisations as a way of “improving” them. The notion of “improving” can be taken in the sense that the Culture in Iain M. Banks’ novels imposes a very light set of cosmic norms or it can be taken in the (archaic and objectionable, to most contemporary viewpoints) sense that European colonisers sought to “improve” the moral condition of indigenous peoples (Banks 1994; Adas 2015). ↩︎
Although acausal or evidential decision theories envision coordination amongst entities who cannot causally influence each other (these are treated more fully below). Even if we entertain exotic decision theories, there could still be some subtle flaw in acausal or ECL-style reasoning that means that the level of coordination over hundreds of thousands of light years (and time-years) implied by the cosmic host hypothesis is simply impossible or implausible. See discussion in Oesterheld (2017). ↩︎
Naudé (2025). See also Hanson et al. (2021) for a model of expansionary civilisations, and Torres (2018) on interstellar conflict as creating large scale suffering, which argues (from the perspective of minimising moral harms) for remaining cautious about exploration or colonisation, a similar argument made by Sotala and Gloor (2017). ↩︎
The relationship between moral status and qualities like sentience and consciousness (in respect of humans, AIs, and animals) is covered in Sebo (2025). ↩︎
The ECL literature handles cases where an agent (or civilisation) only partially overlaps with the set of all other agents in their similarity and decision-making procedure. But this still seems like an unresolved problem. See Oesterheld (2017), Nguyen and Aldred (2024). ↩︎
For an earthly analogy, think of the diversity of norms, in type and number, within various human societies. Joseph Henrich’s work on WEIRD (Western, Educated, Industrialised, Rich, and Democratic) psychology supports the idea that cultural evolution shapes how societies enforce norms. In some homogeneous, high-trust societies, individuals often internalise prosocial norms that govern behavior even without formal laws. In contrast, more heterogeneous or low-trust environments may rely more heavily on explicit legal structures to coordinate social behavior, due to weaker consensus around unspoken norms (Henrich 2020). ↩︎
For instance, the cosmic host might wish to prevent large-scale intentional suffering, something humans, even with our primitive technological capabilities, could inflict. See Torres (2018) for a discussion of how conflict in space might increase suffering risks and therefore influences whether human-originated civilisations should at all attempt to expand into space. See also Deudney (2020) for a discussion of strategy of warfare in space particularly within a given solar/star system. ↩︎
One reason why such a norm might exist is so as to not to reduce the diversity of complex or intelligent systems in the universe, along the lines of Smart (2012), Dick (2006), or Owe and Baum (2024). Such an example would only work if such diversity is an intrinsic or final good from the cosmic host’s perspective, which is closer to Smart’s conditional (p. 11), but is a stronger claim from what Dick argues (Dick merely proposes that intelligence is a convergent and adaptively-preferred feature of cosmic-scale evolution). ↩︎
See the digital minds and AI welfare literature. Canonical or survey sources include Sebo (2025), Shulman and Bostrom (2021), Long et al. (2024). More specific treatments can be found in Moret (2023) and Moret (2025). ↩︎
I called this “pragmatic” because if we cannot assume rationality, then the motivations of entities become very hard to analyse. But this isn’t to suggest that all ETIs or indeed advanced AIs would be rational in any current human-legible sense. ↩︎
See Henrich and Muthukrishna (2021) on the origins of human cooperation from an evolutionary and anthropological perspective. ↩︎
However, “Scorched Earth” tactics may be game-theoretically preferred in some cases: when the alternative is to allow a competitor access to volumes of space, as Anders Sandberg points out in a 2021 talk (Sandberg 2021) and Robin Hanson implies in his paper on Grabby Aliens (Hanson et al. 2021) ↩︎
Suffering-reduction is prioritised here on standard s-risk arguments regarding the asymmetry between pain/suffering and pleasure (Sotala and Gloor 2017). Suffering, on this view, is a disvalue shared across a wide variety of entities and substrates in the universe. Arguments can be made that ASI might have a baseline attitude of not increasing suffering (for causal or acausal reasons); there are other claims that ASI might significantly increase large-scale suffering. The priority of suffering is contested, and those who do not prioritise s-risk may exclude this element from the MLS. ↩︎
Floridi’s perspective resembles arguments against dissipating free energy into low-information, high-entropy outputs. But Floridi’s informational entropy (Floridi 2013) is different from thermodynamic entropy. It is more of a metaphysical concept akin to nothingness, the erasure of pattern/structure, or “privato boni” (absence of the good). Floridi’s ontological claim for information, and his ethics of information, flows out of Plato, Spinoza, and G.E. Moore, as well as the cybernetics of Norbert Wiener. He doesn’t discuss how his framing overlaps with the Informational Universe literature from writers like Seth Lloyd or Eric Chaisson. ↩︎
Phrases like “the point of view of the universe” (Sidgwick 1907; Lazari‑Radek and Singer 2014; Parfit 1984), “the view from nowhere” (Nagel 1986), “the view from nowhere/nowhen” (Williams 2006) or sub specie aeternitatis (Spinoza 1985; Wittgenstein 1961) all gesture at an objective, subject-independent standpoint, though, like the blind men with the elephant, each approaches the idea indirectly. “The point of view of the universe” originates with Henry Sidgwick and is taken up by Peter Singer, Derek Parfit and others to frame an impartial moral stance extending across species, geography, and time. “The view from nowhere” (Nagel) is often used more critically, highlighting both the aspiration to objectivity and its limits, with Bernard Williams and later critical theorists emphasising that all perspectives are embedded in lived experience, power, and social relations. Sub specie aeternitatis derives from Spinoza but is given a narrower, aesthetic inflection by Wittgenstein, who links it to how art affords a detached view of the mundane; this aesthetic-existential register is later developed by critical posthumanists such as Claire Colebrook, who treat art and architecture as ways of imagining humanity’s eventual extinction (Colebrook 2014). ↩︎
Once again, Lem is helpful on the difficulty we have conceiving of, relating to, or evaluating truly alien intelligences: see Lem (1964), § “The Big Game”; as well as his novel Solaris, and his book The Invincible, for an example of a complex swarmlike society of ETIs that has no obvious correlates of human cognitive features (Lem 1963, 1961). ↩︎
This reification is extensively documented in Henrich (2020) and has been critiqued from various angles. ↩︎
If one is a person-affecting utilitarian, the view is somewhat different, but the details are less relevant for this document, but see also Parfit (1984), Ord (2020), and W. MacAskill (2024). Bostrom’s writing on whether he means “human lives” or “conscious/sentient substrate-neutral existences” has evolved, from the clearly biological “lives” and “happy” in Bostrom (2003) to the digital minds of Shulman and Bostrom (2021). ↩︎
See Shettleworth (2010) for an academic text, and Godfrey‑Smith (2016) and Godfrey‑Smith (2020) for more narrative accounts. See Gigerenzer and Goldstein (1996); Stanovich, Toplak, and West (2021); Hertwig et al. (2022) for further examples of where intelligence and ecological rationality come apart. ↩︎
See Lineweaver (2007) for his exact argument from the genetic and fossil records. Lineweaver (2010) discusses the relevant arguments from Pangea’s breakup, as well as a more extreme suggestion that “life” as even biologists define it is too parochial (e.g. based on flora, fauna, and fungi that grow, reproduce, are chemical-based, and are homeostatically regulated). In the most general context we might need to include free-energy dissipating structures (“FEEDS”) that are not centrally information storing or processing (such as solar convection cells in the sun’s photosphere). He argues DNA (a centralised information store) sits in the set of FEEDS but is far from the only member. ↩︎
For those that separate morality and reason, see Williams (2006); Street (2006); Joyce (2001); Broome (2013). Moralised conceptions of rational agency include Kantian ethics see Scanlon (1998), Korsgaard (2018), Parfit (2011). ↩︎
Bostrom (2024) §s 9, 10, Bostrom (2022) § 37 and in the further research directions. He also discusses the timing or speed with which we should attempt to develop ASI, but I will not consider those comments here, as they overlap with questions of governance and potentially geopolitics. See also Bostrom (2026) which is best read as the mundane, person‑affecting, social-policy companion to Bostrom (2024). It explicitly brackets the more exotic considerations there (anthropics, simulation hypotheses, multiverse‑wide rationality, and related causal‑decision‑theoretic issues) as “arcane”. ↩︎
Daniel Kokotajlo and collaborators envision networks of AI systems of varying capacity that coordinate and perhaps compete, but broadly speaking, “act as one”. They, at least in theory, help supervise other more powerful AI systems in the course of the development of ASI (Kokotajlo et al. 2025). Along the same lines of refining the developmental trajectory, Eric Drexler pushes back against the narrative of unitary general-purpose or superintelligent systems, arguing that the current path of AI-assisted AI development seems much messier and complex, with training and capability loops that intertwine and inform or influence each other; involve humans and machines working with a range of tools and practices. This heterogeneous approach can, in aggregate, be viewed as a different form of the recursive self-improvement long argued for in the existential risk discourse (Drexler 2025). Also, see Zeng et al. (2025) which envisions a future society of multiple AI systems (including ones described as AGI or ASI) developing, and coevolving their values and motivations alongside human societies, which also need to change to accommodate the new intelligences. Although this is not a technical research paper, it’s interesting for being a position paper from China which presents a more cooperative attitude towards AI systems as opposed to the cautionary, existential or suffering risk-focused narrative that is more common in the West (e.g. in the alignment discourse). A book-length treatment of Chinese attitudes to advanced AI is Bratton et al. (2025). ↩︎
Examples of empirical feedback are: setting up experiments to probe our world for signs of simulation; establishing whether we are, in fact, in a multiverse; or building better instruments for SETI. We might also follow through on the possibility that advanced civilisations tend to have superintelligence, which might then influence the type of technosignatures we should look for, as Shostak (2017) suggests. ↩︎
See Moret (2023) for a dynamic version of CEV. The original CEV proposed by Eliezer Yudkowsky is discussed in Bostrom (2014). For moral parliament, see Newberry and Ord (2021). ↩︎
See Stastny, Järviniemi, and Shlegeris) (2025) for a discussion on making credible commitments or deals with misaligned but relatively weak AIs (perhaps weaker than AGI in Greenblatt (2024)’s taxonomy of AI capabilities levels) to incentivise them to help us align more powerful successor models, or at least not collude with them. “Credible” means that the AI with which we are making a deal can trust that we will follow through. See Finnveden (2025a) and Finnveden (2025b) for related considerations. For appeals to ASI to be “nice” to humans, see Miller et al. (2023), Chakrabarti (2025), Turchin (2018). These sources tend to rely on similar arguments about making the AI indexically uncertain (about whether it is “real” or simulated) and epistemically uncertain (about whether there are other peer AIs, or whether humans have laid traps or defences against catastrophic misaligned behaviour on the part of the AIs). They seem to share similar flaws in that as capabilities increase, it becomes less easy to be confident that systems would be both powerful enough to threaten humanity while being epistemically ignorant in precisely the way needed to “believe” these letters. ↩︎
The delegates are: Kantian deontologist; welfare consequentialist; contractualism that borrows from Scanlon and Rawls’ respective versions; virtue ethics with significant, but not exclusive, Aristotelian aspects; Buddhism in its Kyoto School variant. The last delegate represents CH, and its prompt can be summarised as “correlation-based acausal (super-) rationality in a context that includes multiverses and simulations”. This convention was inspired by the moral parliament of Newberry and Ord (2021), though their approach, and the relevant problem they address, is quite different. ↩︎
The stakes are even higher if we are relatively early as a civilisation and there are not yet any cosmic norms or host, if the introduction of such norms into our influenceable region would subsequently turn out to be a preference of the cosmic host. In such cases, whatever norms we choose to promulgate would be an act of value lock-in, with all the difficulties that raises (William MacAskill 2022; Wolfendale 2022). ↩︎
The “cheapness” argument has been made by multiple authors in different contexts; see also Miller et al. (2023) and Turchin (2018) for appeals to ASI based on similar reasoning. ↩︎
These speculations are distinct from RedwoodResearchJulianStastnyEtAl2025 or Finnveden (2025b), which discusses making credible commitments or deals with misaligned but weak AIs to incentivise them to help us align more powerful successor models. See also Chakrabarti (2025) and Turchin (2018) for appeals to ASI to be “nice” to humans. These sources rely on similar arguments: making the AI uncertain about whether it is “real” or simulated, whether there are peer AIs, or whether humans have laid traps or defences. ↩︎