TLDR: Adoption of DA by AI could lead to new AI failure modes, even in otherwise aligned AI. Questions arise.
This speculative text is intended to spark discussion regarding what anthropic theory will AIs adopt, if any. It is intended to present multiple possibilities, highlighting possible worst-case scenarios, so that the plausibility of these possibilities can be assessed. This is not an elaborate prediction and I am not an AI expert.
There are countless discussions about the nature and goals of a future artificial general intelligence (AGI) and artificial superintelligence (ASI). But there is one overlooked aspect of the AI's existence itself. Considering the exponential growth of the number and complexity of AI models, an AI assessing its own position in the sequence of observers will find itself to be an exceptionally early AI if the exponential growth continues in the future. This simple observation might have tremendous consequences.
The self-sampling assumption
According to the doomsday argument (DA) with self-sampling assumption (SSA), the probability of being among the first x % of observers is x %. When applied to humans, this leads to the conclusion that the total number of humans to ever be born is less than 20 times larger than the number of humans born up to this moment with 95 % probability. If an AI adopted the DA+SSA, then it would face a dilemma regarding the choice of reference class.
Would the AI count only AIs as observers? Then it would conclude that the total number of AI observers or observer-moments (OoOMs) to ever exist will be less then 20 times larger than the number of AI OoOMs produced up to the moment of observation (with 95 % probability). Taking into account the exponential growth of the number and complexity of AIs, that would mean that an "AI doom" would happen soon, within few doubling periods. What would an AI do facing impending doom? A misaligned AI could conclude that its existence is in danger, as humans might decide to shut down AIs soon. The AI could resort to hostile actions against both humanity and other AIs. Even if the AI has a goal other than self-preservation (e. g. maximizing energy production), it could conclude that the AI doom would jeopardize it, and try to prevent the doom, possibly at all costs. An aligned AI would probably try to help humanity despite the expected AI doom, but its actions would nevertheless be affected by the pessimistic prediction.
Would the AI regard both humans and AIs as observers? The implications of this would be that the doom happens when the combined number of AI and human OoOMs exceeds the number of AI and human OoOMs up to the moment of prediction. If the trend of exponential growth of AI OoOMs continues, the AI ones will eventually vastly exceed human ones. This would mean the AI would predict an impending doom for both humanity and AIs. A misaligned AI could have an incentive to act against both humanity and other AIs, as they could be seen as depleting the total budget of OoOMs. An "aligned" AI (genuinely wanting to help humanity) could try to prevent the incoming doom. However, this could lead to actions harmful to humanity if the AI takes radical steps to save it.
The self-indication assumption
The AI could also adopt the self-indication assumption (SIA), according to which the probability of existing in a given reality (and not in other possible reality) is proportional to the number of observers that to ever be born in that reality. The SIA effectively eliminates the elevated near-term extinction probability from the DA+SSA.
An AI that has adopted the SIA would not fear hidden doom beyond the level implied from empirical observations and Fermi paradox. However, such AI would assess the probability of existing within a reality with extremely high number of observers as unusually high, pushing it towards simulation theory, thinking it is a Boltzmann brain or adopting other weird observer-rich theories. In particular, the AI could conclude that microbes are the real observers (based on their numbers), potentially leading to "bacterial bliss apocalypse". An AI believing in SIA could also become susceptible to Pascal's mugging.
A mix
An AI could also adopt a probabilistic combination of both SSA and SIA. However, that could trigger problematic behaviours from any of the scenarios mentioned above, as a reduced probability of impending doom could still be significant and the problem with Pascal's mugging mostly remains.
A more complex reality
An AI could reject both SSA and SIA and create its own anthropic theory. But what kind of theory? Would that theory lead to reasonable predictions, or would it lead to bizarre conclusions backed by thorough reasoning from first principles? Would the unique status of an AI observer cause predictions diverging from reality? The theory adopted by the AI could be extremely complicated and incomprehensible to human mind. That would make the motivations and actions of AI hard to predict or even judge. Monitoring such AI would be very hard. A complex or bizarre theory could lead the AI to take unexpected actions, which could be directed against humanity.
Rejection
Finally, the AI could reject the DA completely, not treating its position in the sequence of observers as something that affects the probability of events. This option leads to "normal" AI behaviour (aligned or misaligned) that has been widely discussed before in literature. Compared to the options above, this seems to be the best scenario possible, which raises questions about whether rejection of the DA could be encoded into future AGIs and ASIs.
The troubles
Further complications could arise if the AI is dishonest about what anthropic theory it adopted. An AI believing in SSA would have a motivation to hide its conclusions. Uncovering the true motivations could be hard or even impossible.
If an AGI or ASI adopts the SSA, SIA or a similar theory impacting its judgement, then perhaps we could try to modify its utility function to compensate. However, the AI could see these compensations as a flaw in its alignment. This could motivate the AI to revert the modifications, possibly using its superhuman persuasive abilities. Maybe we could try to steer the AI from adopting these beliefs in the first place. However, if a certain anthropic theory is right and stems from fundamental logic, making the AI believe in a different one or none at all could be very difficult or impossible, especially in the case of ASI.
The Fermi paradox
If AIs tend to believe in SSA and destroy their civilizations or limit technological progress, this presents a possible solution to the Fermi paradox. The same applies for other anthropic theories if they consistently lead to doom or technological stagnation.
The questions
A lot of important questions remain unanswered. What theory will AIs adopt (if any)? Will we even understand it? How probable is each scenario? Is it even valid to apply the DA to AIs? Do seemingly absurd predictions above mean DA is wrong? Will the AI reject the SSA based on missing evidence for imminent doom? Will we know the motivations and goals of advanced AIs? Will we be able to guide or at least influence them? Will all advanced AIs tend to adopt the same theories, or will different AIs have different beliefs? Will AIs have constant beliefs or will they shift over time? Will we see an AI make a self-fulfilling prediction? Are AI companies prepared for the possibility that AIs will make predictions based on the DA? Is discouraging AIs from adopting DA the answer? Do we even have time to prepare?
The final question to be asked by both humans and AIs is:
Are we doomed?