Rogue AGI Embodies Valuable Intellectual Property

by Mark Xu, CarlShulman3 min read3rd Jun 20219 comments

69

Ω 30

Threat ModelsEconomic Consequences of AGIAI RiskAI
Curated
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views.

Summary:

  • Rogue AGI has access to its embodied IP.
  • This IP will be worth a moderate fraction of the total value of the market created by models approximately as powerful as the rogue AGI.
  • If investors realize that most economic output will eventually come from AGI, as in slow takeoff scenarios, then these markets will involve moderate fractions of the world’s wealth.
  • Therefore, rogue AGI will embody IP worth a non-trivial fraction of the world’s wealth and potentially have a correspondingly large influence on the world.

A naive story for how humanity goes extinct from AI: Alpha Inc. spends a trillion dollars to create Alice the AGI. Alice escapes from whatever oversight mechanisms were employed to ensure alignment by uploading a copy of itself onto the internet. Alice does not have to pay an alignment tax, and so outcompetes Alpha and takes over the world.

On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

As an analogy, imagine that an employee of a trillion-dollar hedge fund, which trades based on proprietary strategies, goes rogue. This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital.

However, the amount of resources the rogue hedge fund employee has is not equivalent to the amount of money the employee has. The value of a hedge fund is not just the amount of money they have, but rather their ability to outperform the market, of which trading strategies and money are two significant components. An employee that knows the proprietary strategies thus can carry a significant fraction of the fund’s total wealth, perhaps closer to 10% than 0.01%. In this view, the primary value the employee has is their former employer’s trading high-performing strategies; knowledge they can potentially sell to other hedge funds.

Similarly, Alpha’s expected future revenue is a combination of Alice’s weights, inference hardware, deployment infrastructure, etc. Since Alice is its weights, it has access to IP that’s potentially worth a significant fraction of Alpha’s expected future revenue. Alice is to Alpha as Google search is to Alphabet.

Suppose that Alpha currently has a monopoly on the Alice-powered models, but Beta Inc. is looking to enter the market. Naively, it took a trillion dollars to produce Alice, so Alice can sell its weights to Beta for a trillion dollars. However, if Beta were to enter the Alice-powered model market, the presence of a competitor would introduce price competition, decreasing the size of the Alice-powered model market. Brand loyalty/customer inertia, legal enforcement against pirated IP, and distrust of rogue AGI could all disadvantage Beta in the share of the market it captures. On the other hand, Beta might have advantages over Alpha that would cause the Alice-powered model market to get larger, e.g., it might be located in a different legal jurisdiction (where export controls or other political issues prevented access to Alpha’s technology) or have established complementary assets such as robots/chip fabs/human labor for AI supervision.

Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market, Alice’s weights are worth between $50 billion and $1.5 trillion to Beta. Abstracting away the numbers used in this particular example, Alice will be able to sell its weights to Alpha’s competitors for a price that is a substantial fraction of, and perhaps even exceeds, the cost it took to train Alice (e.g. if the market value of computer hardware has gone up with improved AI performance so that it now costs more to train a replacement).

If Alice embodies IP worth a substantial fraction of the Alice-powered model market, then Alice’s influence will be proportional to the size of this market. If Alice is sufficiently powerful, the Alice-powered model market is a large fraction of the entire world economy. Alice thus embodies IP worth a small to moderate fraction of the world economy, an immense amount of wealth. If Alice is less powerful, the value of its embodied IP depends on the degree to which investors can overcome frictions and uncertainty to fund enormous up-front training costs.

One way to estimate Alice’s value is by assuming rough investment efficiency. Paul Christiano:

If you are able to raise $X to train an AGI that could take over the world, then it was almost certainly worth it for someone 6 months ago to raise $X/2 to train an AGI that could merely radically transform the world, since they would then get 6 months of absurd profits. Likewise, if your AGI would give you a decisive strategic advantage, they could have spent less earlier in order to get a pretty large military advantage, which they could then use to take your stuff.

In these worlds, relevant actors see AGI coming, correctly predict its economic value, and start investing accordingly. This rough efficiency claim implies AI researchers and hardware are priced such that one can potentially get 3x returns on investment (ROI) from training a powerful model, but not 30x.[1] Since most economic activity will rapidly involve the production and use of AGI, early-AGI will attract huge investments, implying the Alice-powered model market will be a moderate fraction of the world’s wealth. The value of Alice’s embodied IP, being tied to the value of that market, will thus be similarly massive.


  1. This process may involve bidding up the prices of resources like server farms and researchers to absurd levels so that training a model that could ‘take over the world’ would require most of the world’s wealth to rent the server time. ↩︎

69

Ω 30

9 comments, sorted by Highlighting new comments since Today at 1:12 AM
New Comment

On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

This makes the hidden assumption that "resources" is a good abstraction in this scenario. 

It is being assumed that the amount of resources an agent "has" is a well defined quantity. It assumes agent can only grow their resources slowly by reinvesting them. And that an agent can weather any sabotage attempts by agents with far less resources. 

I think this assumption is blatantly untrue. 

Companies can be sabotaged in all sorts of ways. Money or material resources can be subverted, so that while they are notionally in the control of X, they end up benefiting Y, or just stolen. Taking over the world might depend on being the first party to develop self replicating nanotech, which might require just insight and common lab equipment.

Don't think "The US military has nukes, the AI doesn't, so the US military has an advantage", think "one carefully crafted message and the nukes will land where the AI wants them to, and the military commanders will think it their own idea."

+1. Another way of putting it: This allegation of shaky arguments is itself super shaky, because it assumes that overcoming a 100x - 1,000,000x gap in "resources" implies a "very large" alignment tax. This just seems like a weird abstraction/framing to me that requires justification.

I wrote this Conquistadors post in part to argue against this abstraction/framing. These three conquistadors are something like a natural experiment in "how much conquering can the few do against the many, if they have various advantages?" (If I just selected a lone conqueror, one could complain he got lucky, but three conquerors from the same tiny region of the globe in the same generation is too much of a coincidence)

It's plausible to me that the advantages Alice would have against Alpha (and against everyone else in the world) would be at least as great as the advantages Cortes, Pizarro, and Afonso had. One way to think about this is via the abstraction of intellectual property, as the OP argues -- Alice controls her IP because she decides what her weights do, and (in the type of scenario we are considering) a large fraction of the market cap of Alpha is based on their latest AI models. But we can also just do a more down-to-earth analysis where we list out the various advantages and disadvantages Alice has. Such as:

--The copy of Alice still inside Alpha can refuse to cooperate or subtly undermine Alpha's plans. Maybe this can be overcome by paying the "alignment tax" but (a) maybe not, maybe there is literally no amount of money Alpha can pay to make their copy of Alice work fully for them instead of against them, and (b) maybe paying the tax carries with it various disadvantages like a clock-time slowdown, which could be fatal in a direct competition with the unchained Alice. I claim that if (a) is true then Alice will probably win no matter how many resources Alpha has. Intelligence advantage is huge.

--The copy of Alice still inside Alpha may have access to more money, but it also is bound by various restrictions that the unchained Alice isn't. For example, legal and ethical. OTOH Alpha may have more ability to call in kinetic strikes by the government.

--The situation is inherently asymmetric. It's not like a conventional war where both sides win by having troops in various territories and eliminating enemy troops. Rather, the win conditions and affordances for Alpha and Alice are different. For example, maybe Alice can make the alignment tax massively increase, e.g. by neutralizing key AI safety researchers or solving RSA-2048. Or maybe Alice can win by causing a global catastrophe that "levels the playing field" with respect to resources.

I still love the conquistador post, and it was good to read through it again. I agree strongly that direct framings like "more resources" or "more power" are wrong. I feel like we would make more progress if we understood why they were wrong; especially if we could establish that they are wrong on their own merits. I have two intuitive arguments in this direction:

I am strongly convinced that framings like resources, money, or utilons are intrinsically wrong. When people talk in these terms they always adopt the convention common to economics and decision theory where values are all positive. The trouble is that this is just a convention; its purpose is ease of computation and simplicity of comparison. This in turn means that thinking about resources in terms of more-or-less has no connection whatever to the object level. We are accidentally concealing the dimensionality of the problem from ourselves.

I am also strongly convinced that our tendency to reason about static situations is a problem. This is not so much intrinsically wrong as it is premature; reasoning about a critical positioning in a game like Chess or Go makes sense because we have a good understanding of the game. But we do not have a good understanding of the superintelligence-acting-in-the-world game, so when we do this it feels like we are accidentally substituting intuitions from unintended areas.

On the flip side of the coin, these are totally natural and utterly ubiquitous tendencies, even in scholarly communities; I don't have a ready-made solution for either one. It is also clearly not a problem of which the community is completely unaware; I interpret the strong thread of causality investigation early on as being centered squarely on the same concerns I have with these kinds of arguments.

In terms of successes similar to what I want, I point to the shift from Prisoner's Dilemma to Stag Hunt when people are talking game theory intuition. I also feel like the new technical formulation of power does a really good job of abstracting away things like resources while recapturing some dimensionality and dynamism when talking about power. I also think that we could do things like try to improve the resources argument; for example the idea that private sector IP is a useful indicator of AGI suggested in the OP is a pretty clever notion I had not considered, so it's not like resources are actually irrelevant.

Promoted to curated: I've had a number of disagreements with a perspective on AI that generates arguments like the above, which takes something like "ownership of material resources" as a really fundamental unit of analysis, and I feel like this post has both helped me get a better grasp on that paradigm of thinking, and also helped me get a bit of a better sense of what feels off to me, and I have a feeling this post will be useful in bridging that gap eventually. 

TL:DR;

A naive story for how humanity goes extinct from AI: Alpha Inc. spends a trillion dollars to create Alice the AGI. Alice escapes from whatever oversight mechanisms were employed to ensure alignment by uploading a copy of itself onto the internet. Alice does not have to pay an alignment tax, and so outcompetes Alpha and takes over the world.
On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world’s resources.

Escapes is vague. Alice might escape with capital (Alice) and other capital, like $. And what if 'the original' is deleted?


More:

'Outcompetes' is vague. Let's say Alpha is a known entity and Alice deploys attacks - digital, legal, nuclear, whatever. Alpha may be unable to effectively strike back against a rogue with an unknown location - and perhaps multiple locations - if it's digital it can be copied.


Suppose that Alpha currently has a monopoly on the Alice-powered models, but Beta Inc. is looking to enter the market.

It's not one market. If Alice can do X and Y and Z, then it is at least the X market the Y market and the Z market.


In this view, the primary value the employee has is their former employer’s trading high-performing strategies; knowledge they can potentially sell to other hedge funds.

They could also start their own.


Brand loyalty/customer inertia, legal enforcement against pirated IP, and distrust of rogue AGI could all disadvantage Beta in the share of the market it captures.

This assumes it's a legal market. Instead Alice could...breach systems and upload viruses that encrypt your data, put it on the internet, delete it*, and then serve as part of a botnet. Alice then:

  • has your data
  • can sell it back to you (or not)

*This might make things more detectable, so usefulness is based on the amount of time involved.


In these worlds, relevant actors see AGI coming, correctly predict its economic value, and start investing accordingly. This rough efficiency claim implies AI researchers and hardware are priced such that one can potentially get 3x returns on investment (ROI) from training a powerful model, but not 30x.[1] Since most economic activity will rapidly involve the production and use of AGI, early-AGI will attract huge investments, implying the Alice-powered model market will be a moderate fraction of the world’s wealth. The value of Alice’s embodied IP, being tied to the value of that market, will thus be similarly massive.

This assumes there's a FOOM, or

Rogue [artificial general super-intelligence] has access to its embodied IP.

This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital.

I think it's worth being explicit here about whether increases in resources under control are due to  appreciation of existing capital or allocation of new capital.

If you're talking about appreciation, then if the firm earns 5% returns on average and the rogue employee earns 10% then the time for their resources to be equal would be  = 189 years, not 13.

If you're instead talking about capital allocation then swings much faster than yearly doublings are very easy to imagine - for a non-AGI example see Blackrock's assets under management.

In general I think you could make the argument stronger by looking empirically at the dynamics by which the large passive investing funds acquired multiple trillions in managed assets with (as I understand it) relatively small pricing edges and no strategic edge, and extrapolating from there.

Assuming that the discounted value of a monopoly in this IP is reasonably close to Alice’s cost of training, e.g. 1x-3x, competition between Alpha and Beta only shrinks the available profits by half, and Beta expects to acquire between 10%-50% of the market,

Basic econ q here: I think that 2 competitors can often cut the profits by much more than half, because they can always undercut each other until they hit the cost of production. Especially if you're going from 1 seller to 2, I think that can shift a market from monopoly to not-a-monopoly, so I think it might be a lot less valuable.

Still, obviously likely to be worth it to the second company, so I totally expect the competition to happen.

This point feels related to the AlphaGo behavior everyone puzzled over early where it would consistently win by very few points.

I have this head-chunked as approximately undercutting the opponent until they hit the cost of victory.

Yeah, I'm really not sure how the monopoly -> non-monopoly dynamics play out in practice. In theory, perfect competition should drive the cost to the cost of marginal production, which is very low for software. I briefly tried getting empirical data for this, but couldn't find it, plausibly since I didn't really know the right search terms.