Intelligence allocation from a Mean Field Game Theory perspective

Marv K

Each of us now faces a choice every minute of every day:

Should I really use my brain power for this, or shoud I let ChatGPT do it?

Using your own brain has its merits. You might get more intelligent and resilient in the long run. But ChatGPT already does most things much better and quicker, so why not let it spare you stirring your own grey goo? Then again, supporting ChatGPT may increase the likelihood of an extinction event.

In this post, we’re gonna look at this from a mean field game theory perspective.

First, let’s state the assumptions:

There are N individuals, each of them having a level of intelligence described by the stochastic process X_i(t) for i=1,...,N.

Each individual can choose at each time t a control α_i(t) in [0,1], where α_i(t) = 0 means the individual uses their own intelligence to solve the task and α_i(t) = 1 means they use ChatGPT to solve the task.

The intelligence of an individual and the AGI evolve as a result of these decisions.

The payoff for each individual is the cumulative tasks solved up to time T, but if the average intelligence of ChatGPT surpasses a certain threshold before time T, everyone is extinct and the payoff is zero.

Individual Behavior (SDE)

The dynamics of the individual's intelligence could be modeled as the following stochastic differential equation:

dX_i(t) = [α_i(t) b_1(X_i(t), m(t)) + (1-α_i(t)) b_2(X_i(t), m(t))]dt + σ W_i(t)

Here, the terms b_1 and b_2 represent the drifts under the two options available to the agents - use their own intelligence or use ChatGPT to solve tasks.

Here's a possible way to define them:

b_1 could be a function that represents the rate of increase in the player's intelligence when they solve tasks on their own. For instance, we could model this as proportional to the current intelligence of the player, but with diminishing returns. This might look like: b_1(X_i(t), m(t)) = a_1 * X_i(t) / (1 + d_1 * X_i(t)) Here, a_1 is a positive constant that represents the maximum possible rate of increase in intelligence, and d_1 is another positive constant that represents the rate at which the increase in intelligence slows down as the player's intelligence increases.
b_2 could be a function that represents the rate of increase in the AGI's intelligence when it is used to solve tasks. For instance, we could model this as proportional to the average intelligence of all players (since more intelligent players would be more likely to improve the AGI's intelligence), but with the risk of extinction when the AGI's intelligence exceeds a certain threshold. This might look like: b_2(X_i(t), m(t)) = a_2 * m(t) / (1 + d_2 * m(t)) - e * max(0, m(t) - T) Here, a_2 and d_2 are positive constants that represent the maximum possible rate of increase in the AGI's intelligence and the rate at which this increase slows down as the average intelligence increases, e is a large positive constant that represents the risk of extinction when the AGI's intelligence exceeds a threshold T.

Mean Field Behavior (PDE)

The evolution of the distribution of intelligence across all individuals, denoted by μ(t), could be described by the following partial differential equation (the Fokker-Planck equation):

∂μ/∂t + div[(α(x, m(t)) * b_1(x, m(t)) + (1 - α(x, m(t))) * b_2(x, m(t))) * μ] - 0.5 * σ^2 * Δμ = 0

The optimal control α* should then solve the following Hamilton-Jacobi-Bellman (HJB) equation:

sup_α {H(x, m, α, p) + ∂p/∂t + (α * b_1(x, m) + (1 - α) * b_2(x, m)) * ∂p/∂x - 0.5 * σ^2 * ∂²p/∂x²} = 0

We might obtain the optimal control with dynamic programming or finite element methods.

I am currently learning about MFG theory and made the connection to alignment and thought it would be worth it to share this approach to modelling the distribution of intelligence.

Cheers.

[-]Mitchell_Porter10mo31

the rate of increase in the AGI's intelligence when it is used to solve tasks

ChatGPT doesn't even remember conversations, so taken literally, the value of this quantity is "zero".

However, by being a user, you potentially or actually provide OpenAI with feedback that will help them improve their product, so in that sense, there can be a nonzero relationship.

If you're going to write equations like these, you may as well as model the AI's "level of alignment" as well as its intelligence. The AI is an extinction risk, only if it is unaligned when it reaches superintelligence. So you should model the effect of user choices on AI alignment as well.

[-]Marv K10mo10

I agree on both counts. You're right that I should model the alignment of the system as well as its intelligence. I guess the alignment could be thought of as minimizing the distance of high dimensional vectors representing the players' and the AI's values. So each user (and the AI, too) could have a value vector associated with it, and the cost functions of the user could then incorporate how much they care about their own alignment (to the rest of the users), and the cost function of the AI needs to be tuned so that it is enough aligned when it reaches a critical threshold of intelligence. That way, you could express how important it is that the AI is aligned, as a function of its intelligence.

LESSWRONG
LW

Intelligence allocation from a Mean Field Game Theory perspective

13

13