Epistemic status: a combination and synthesis of others' work, analysed and written over a few weeks. A high-level list of threat models that is open to criticism.
Humanity could end up in a lock-in within the next century. Here I outline the possible routes to that outcome, and prioritise these routes on a set of criteria for importance.
AGI and Lock-In (Finnveden et al., 2022) was authored by Lukas Finnveden during an internship at Open Philanthropy. AGI and Lock-in is currently the most detailed report on lock-in risk. The report expands on notes made on value lock-in by Jess Riedel, who co-authored the report along with Carl Shulman. The report references Nick Bostrom’s initial arguments about AGI and superintelligence, and argues that many features of society can be held stable for up to trillions of years due to digital error correction and the alignment problem. Specifically, the report focuses on the technological feasibility of lock-in, and outlines the importance of AGI in the long-term stability of features of society in the future.
The authors first argue that dictators, enabled by technological advancement to be immortal, could avoid the historical succession problem, which explains the end of past totalitarian regimes, but theoretically would not prevent the preservation of future regimes. Next, whole brain emulations (WBEs) of dictators could be arbitrarily loaded and consulted on novel problems, enabling perpetual value lock-in.
They also argue that AGI-led institutions could themselves competently pursue goals with no value drift due to digital error correction. This resilience can be reinforced by distributing copies of values across space, protecting them from local destruction. Their main threat model is that If AGI is developed and is misaligned, and does not permanently kill or disempower humans, lock-in is the likely next default outcome.
Finnveden expanded on his threat models in a conversation in 2024, suggesting the following possible ways of arriving at a lock-in:
In What We Owe the Future (MacAskill, 2022), Will MacAskill introduces the concept of longtermism and its implications for the future of humanity. It was MacAskill who originally asked Lukas Finnveden to write the AGI and lock-in report. He expands on the concepts outlined in the report in more philosophical terms in chapter 4 of his book, entitled ‘Value Lock-In’.
MacAskill defines value lock-in as ‘an event that causes a single value system, or set of value systems, to persist for an extremely long time’. He stresses the importance of current cultural dynamics in potentially shaping the long-term future, explaining that a set of values can easily become stable for an extremely long time. He identifies AI as the key technology with respect to lock-in, citing Finnveden et al. (2022). He echoes their threat models:
In Superintelligence (Bostrom, 2014), Nick Bostrom introduces many relevant concepts, such as value alignment and the intelligence explosion. He describes lock-in as a potential second-order effect of superintelligence developing. A superintelligence can create conditions that effectively lock-in certain values or arrangements for an extremely long time or permanently.
In chapter 5, Bostrom discusses the concept of decisive strategic advantage – that one entity may gain strategic power over the fate of humanity at large. He relates this to the potential formation of a Singleton, a single decision-making agency at the highest level. In chapter 7 he introduces the instrumental convergence hypothesis, providing insight into potential motivations of autonomous AI systems. The hypothesis suggests a number of logically implied goals an agent will develop when given an initial goal. In chapter 12, he introduces the value loading problem, and the risks of misalignment due to issues such as goal misspecification.
Bostrom frames lock-in as one potential outcome of an intelligence explosion, aside from the permanent disempowerment of humanity. He suggests that a single AI system, gaining a decisive strategic advantage, could control critical infrastructure and resources, becoming a singleton. He also outlines the value lock-in problem, where hard-coding human values into AI systems that become generally intelligent or superintelligent may lead to those systems robustly defending those values against shift due to instrumental convergence. He also notes that the frameworks and architectures leading up to an intelligence explosion might get locked in and shape subsequent AI development.
In What is a Singleton? (Bostrom, 2005), Nick Bostrom defines the Singleton, also mentioned in Superintelligence, as “a single decision-making agency at the highest level”. He explains that AI may facilitate the creation of a singleton. He explains that an agency that obtains a decisive strategic advantage through a technological breakthrough in artificial intelligence or molecular nanotechnology may use its technologically superiority to prevent other agencies catching up. It might become perpetually stable due to AI-enabled surveillance, mind control, and security. He also explains that a singleton could simply turn out to be a bad singleton – ‘If a singleton goes bad, a whole civilisation goes bad’.
In Value Lock-In Notes 2021 (Riedel, 2021), Jess Riedel provides an in-depth overview of value lock-in from a Longtermist perspective. Riedel details the technological feasibility of irreversible value lock-in, arguing that permanent value stability seems extremely likely for AI systems that have hard-coded values.
Riedel claims that ‘given machines capable of performing almost all tasks at least as well as humans, it will be technologically possible, assuming sufficient institutional cooperation, to irreversibly lock-in the values determining the future of earth-originating intelligent life.’
The report focuses on the formation of a totalitarian super-surveillance police state controlled by an effectively immortal bad person. Riedel explains that the only requirements are one immortal malevolent actor, and surveillance technology.
In this commentary on What We Owe the Future, MacAskill on Value Lock-In (Hanson, 2022), economist Robin Hanson argues that immortality is insufficient for value stability. He believes MacAskill underestimates the dangers of central power and is overconfident about the likelihood of rapid AI takeoff. Hanson presents an alternative framing of lock-in threat models:
In Machine intelligence and capital accumulation (Christiano, 2014), Paul Christiano proposes a ‘naǐve model’ of capital accumulation involving advanced AI systems. He frames agents as ‘soups of potentially conflicting values. When I talk about “who” controls what resources, what I really want to think about is what values control what resources.’ This provides a lens through which lock-in can be seen as a result of the values that made some features of the world stable.
He claims it is plausible that the arrival of AGI will lead to a ‘crystallisation of influence’, akin to lock-in – where whoever controls resources at that point may maintain control for a very long time. He also expresses concern that influence over the long-term future could shift to ‘machines with alien values’, leading to humanity ending with a whimper.
He illustrates a possible world in which this occurs. In a future with AI, human wages fall below subsistence level as AI replaces labour. Value is concentrated in non-labour resources such as machines, land, and ideas. The resources can be directly controlled by their owners, unlike people. So whoever owns the machines captures the resources and income, causing the distribution of resources at the time of AI to become ‘sticky’ – whoever controls the resources can maintain that control indefinitely via investment.
When defining lock-in, we identified a set of dimensions for choosing which lock-in scenarios to begin focusing on. We claim that, while it is not yet clear which scenarios would be positive, we believe lock-in scenarios with the following properties would be negative:
Using these properties to help direct our attention on the kinds of lock-in scenarios to focus on, we prioritise our lock-in threat models using the following dimensions:
We list the fundamental threat models synthesised from the work above according to the prioritisation criteria as follows:
These categories can be broken down into ever more specific scenarios, for which interventions can be designed.