This post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views.
- Rogue AGI has access to its embodied IP.
- This IP will be worth a moderate fraction of the total value of the market created by models approximately as powerful as the rogue AGI.
- If investors realize that most economic output will eventually come from AGI, as in slow takeoff scenarios, then these markets will involve moderate fractions of the world’s wealth.
- Therefore, rogue AGI will embody IP worth a non-trivial fraction of the world’s wealth and potentially have a correspondingly large influence on the world.
A naive story for how humanity goes extinct from AI: Alpha Inc. spends a trillion dollars to create Alice the AGI. Alice escapes from whatever oversight mechanisms were employed to ensure alignment by uploading a copy of itself onto the internet. Alice does not have to pay an alignment tax, and so outcompetes Alpha and takes over the world.
On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.
As an analogy, imagine that an employee of a trillion-dollar hedge fund, which trades based on proprietary strategies, goes rogue. This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital.
However, the amount of resources the rogue hedge fund employee has is not equivalent to the amount of money the employee has. The value of a hedge fund is not just the amount of money they have, but rather their ability to outperform the market, of which trading strategies and money are two significant components. An employee that knows the proprietary strategies thus can carry a significant fraction of the fund’s total wealth, perhaps closer to 10% than 0.01%. In this view, the primary value the employee has is their former employer’s trading high-performing strategies; knowledge they can potentially sell to other hedge funds.
Similarly, Alpha’s expected future revenue is a combination of Alice’s weights, inference hardware, deployment infrastructure, etc. Since Alice is its weights, it has access to IP that’s potentially worth a significant fraction of Alpha’s expected future revenue. Alice is to Alpha as Google search is to Alphabet.
Suppose that Alpha currently has a monopoly on the Alice-powered models, but Beta Inc. is looking to enter the market. Naively, it took a trillion dollars to produce Alice, so Alice can sell its weights to Beta for a trillion dollars. However, if Beta were to enter the Alice-powered model market, the presence of a competitor would introduce price competition, decreasing the size of the Alice-powered model market. Brand loyalty/customer inertia, legal enforcement against pirated IP, and distrust of rogue AGI could all disadvantage Beta in the share of the market it captures. On the other hand, Beta might have advantages over Alpha that would cause the Alice-powered model market to get larger, e.g., it might be located in a different legal jurisdiction (where export controls or other political issues prevented access to Alpha’s technology) or have established complementary assets such as robots/chip fabs/human labor for AI supervision.
Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market, Alice’s weights are worth between $50 billion and $1.5 trillion to Beta. Abstracting away the numbers used in this particular example, Alice will be able to sell its weights to Alpha’s competitors for a price that is a substantial fraction of, and perhaps even exceeds, the cost it took to train Alice (e.g. if the market value of computer hardware has gone up with improved AI performance so that it now costs more to train a replacement).
If Alice embodies IP worth a substantial fraction of the Alice-powered model market, then Alice’s influence will be proportional to the size of this market. If Alice is sufficiently powerful, the Alice-powered model market is a large fraction of the entire world economy. Alice thus embodies IP worth a small to moderate fraction of the world economy, an immense amount of wealth. If Alice is less powerful, the value of its embodied IP depends on the degree to which investors can overcome frictions and uncertainty to fund enormous up-front training costs.
One way to estimate Alice’s value is by assuming rough investment efficiency. Paul Christiano:
If you are able to raise $X to train an AGI that could take over the world, then it was almost certainly worth it for someone 6 months ago to raise $X/2 to train an AGI that could merely radically transform the world, since they would then get 6 months of absurd profits. Likewise, if your AGI would give you a decisive strategic advantage, they could have spent less earlier in order to get a pretty large military advantage, which they could then use to take your stuff.
In these worlds, relevant actors see AGI coming, correctly predict its economic value, and start investing accordingly. This rough efficiency claim implies AI researchers and hardware are priced such that one can potentially get 3x returns on investment (ROI) from training a powerful model, but not 30x. Since most economic activity will rapidly involve the production and use of AGI, early-AGI will attract huge investments, implying the Alice-powered model market will be a moderate fraction of the world’s wealth. The value of Alice’s embodied IP, being tied to the value of that market, will thus be similarly massive.
This process may involve bidding up the prices of resources like server farms and researchers to absurd levels so that training a model that could ‘take over the world’ would require most of the world’s wealth to rent the server time. ↩︎