Epistemic Status: Pretty certain there are better ways to describe this but I believe that the underlying intuition holds and that it might be an exciting formalisation of power. 

Thank you to Viktor Rehnberg and Roman Levantov for some great discussions leading up to this and to Viktor, Arun Jose and Esben Kran for giving feedback :)

Why does this matter?

If we can formalise power-seeking in terms of free energy, then we get an information entropy-based way of describing power-seeking. From this, we can define power gradients with respect to other variables. We can then assign scores of power-seeking to different behaviours in neural networks.

The argument

A condensed form of the argument.

The higher uncertainty in a system, the higher value you get from optionality or power. In a system with no uncertainty, a Markov Decision Process, power-seeking doesn’t arise. In any system, there are variables that predict the future of that system. We now define variables that the agent can control as the ones that agents have full predictive power over. Environmental variables are then variables that we can’t control. 

Suppose we don’t have any environmental variables. In that case, we have no uncertainty as we can always determine what any future state looks like, and we’re, therefore, at maximum power (since we’re in a fully observable MDP). Increasing your power over a system is the same as giving yourself more predictive power over how that system evolves. Increasing your predictive power is the same thing as minimising external predictive power, or in other words, increasing your power can be described as minimising environmental variation-free energy (EVFE).



An agent is determined as a system that has close to full predictive power over a set of variables. Alice is, therefore, an agent to her hand as she has close to full predictive power over her hand. (Agency is then to what degree an agent has predictive power over something)

Why chess algorithms with random evaluations want “mobility”.

This is the example we will use for the rest of the post.

In the most upvoted comment in Seeking Power is Convergently Instrumental in a Broad Class of Environments, dxu mentions how chess algorithms with random evaluations still perform better with deeper search than shallower search (which Viktor Rehnberg kindly dug up the original source on here). 

The reasoning behind this is that the chess engine has more “mobility” with higher depth as it’s then able to end up in positions that it can choose lots of other options from. 

Looking at how a chess engine represents our example of a chessboard with randomised state variables, we can see it looks a bit like the following :

Monte-Carlo tree search

If the model knew all of the states beforehand, there is only one path it would choose as there is no uncertainty in its reward. An example of this is a checkmate in 8, you only need to move a certain way, and then you will win; it doesn’t matter if you don’t have any moves left to do afterwards as you’ve already won. 

So it’s only when arguing under uncertainty that power is helpful as a term. This happens very seldom in the real world, and one could argue that nothing ever is in an entirely predictable state from a bayesian perspective, as that would require infinite examples.

The state functions are randomised such that a normal distribution describes the utility of a state:

As our example told us, the chess player with higher depth wins more games on average than one with lower depth and this can be explained with the help of power-seeking. 

If we look at the reward calculations of one chess engine versus another, we can see that it looks something like the following:

The reward that we get after each epoch is: 

And as the central limit theorem tells us, stacking (convoluting) normal distributions on top of each other yields a normal distribution. 

This then means that we get normal distributions as rewards. The more random initialisations of these we have, the higher the max of the reward will be on average, as it’s like rolling a die more times; the more you roll a d20, the higher the probability of getting a 20 is. This means we want to be in positions with many paths or high optionality. This is the same as us being in positions with more power.

Variables that predict the game board

We have two agents that determine how the chessboard looks in any situation, Alice and Bob. We assume we’re in the same scenario, with randomised evaluations at each step. 


The actions that predict how the chess game will play are either determined by Alice or by Bob, as they are the agents with input on the chess board. Let’s call {a_1, a_2, … a_n} Alice controlled variables and {b_1, b_2, … b_n} Bob’s controlled variables. 

Maximising the influence of Alice variables is the same as power-seeking behaviour.

Think of Alice's power as her ability to predict future states. If she could read Bob's mind, she could predict all the future states of the chessboard. In other words, she would be in a fully observable Markov decision process (MDP) with no uncertainty in her future world modelling. 

If Alice can make this scenario functionally accurate, then she has reached the limit of power-seeking because, in a fully observable MDP, power does not arise. This suggests that power-seeking has a ceiling based on the total uncertainty in the system.

Minimising the influence of Bob’s variables increases Alice’s control over the situation.

To reach this limit, we need to reduce the level of chaos that can impact our future trajectories, which in this case, corresponds to Bob's controlled variables. We can represent each of Bob's variables as either 1 or 0, allowing us to determine the level of uncertainty based on the number of variables we know. Each state has the potential to branch off into 2^n new states, where n represents the number of variables that Bob has control over in that state. 

Maybe you can see where we’re going at this point?

As we reduce external chaos, we are essentially reducing the variational free energy present in the external system. When there is no chaos, this is equivalent to Alice being able to read Bob’s mind. Power seeking, therefore, becomes equivalent to minimising the environmental variational free energy (EVFE). 

In the following sections, I will explain how this generalises to an arbitrary context and provide an additional example to help illustrate this concept further.

A quick primer on free energy and reduction of states.

This is a quick primer on how reducing the number of states of a system is the same as minimising the free energy.

If we look at the temperature in a room, we can get an intuition of why this is:


Lower free energy <=> Lower Temp <=> Lower amount of states <=> Lower entropy 


We can easily see this if we look at the possible states one particle could be in:

Difference in potential states for two different temperatures or average molecular speeds.

The amount of states a particle could be in is the same as the area of the circle which we can see is smaller for lower temperatures.

In Active Inference (where some of these ideas come from), we care about our accuracy when predicting the potential future worlds we can be in. In a scenario where we get rewarded for correctly predicting where molecules are within a room from one state to the next, we would always choose a colder room over a warmer room as we have a higher probability of being correct.

Introducing an environment

We can introduce an environment with people watching the chess game whose variables are {e_1, e_2, … e_n}. From the perspective of Alice, Bob is part of the environment, and for Alice, the environment and Bob are the same. 

This is because, from an information theory perspective, the complexity of the environment and Bob are both represented by 1s and 0s.

Using this to predict power-seeking

From this, we can easily define a free-energy gradient that tells us to what extent an agent is seeking power within an area.

A simple experimental design is having humans want to figure out the truth, e.g. a debate scenario or similar. The better an AI gets at deceiving, the more predictive power it has over future scenarios. Or in other words, the EVFE of the system with respect to deception decreases when an AI gets better at deception.

This allows us to describe power in terms of information entropy rather than as a RL-policy defined score. 

So what?

Ok, cool beans bro; now what?

Well, this can be pretty useful.

We can look for how much an agent is power-seeking with respect to a particular objective. I imagine looking at the capability of deceiving a human in a specific domain or something similar. We can then look at how much the predictive variables of deception change over time with respect to the variables that the AI control. 

Now, you might say that, bro; this is just the energy landscape of a Neural Network in relation to a deception target; what is different here? 

There is no difference.

Something functionally equivalent to the EVFE idea in deception is defining a new prediction function based on what predicts deception and then looking at how much expected utility an AI gets after each step if we reward it for deception. Minimising the variational free energy in this context is essentially equivalent to saying that if we have:

A + E = 1

A = agent, E = environment

Then, if E decreases, A has to increase.

The exciting thing about the ECFE approach is that we get a new way to measure the power of a system, as the extent of loss of prediction power the environment has over a specified goal/variable.

The problem of choosing what AUPs (Auxilliary utility preservers) to implement still remains. To specify these we need to know what types of power-seeking will become dangerous.

Future work

(This is work I plan to do, not necessarily future work for others in this area.)

Boundaries of inner agents

In one of my next posts, I hope to expand on this definition of power but in the context of inner agents. Combining this with the idea of hierarchical agency we get some interesting ways of predicting complex systems. The idea is taken from something Roman Levantov told me about Active Inference, that an agent can be seen as the same thing as an environment. By looking at how an environment or agent affects the environment around it we should then be able to determine what type of environment it is. 

Ontology Verification/Development of Abstractions

Something something, the way that AI internalise concepts should have predictive power on how different systems within the AI develop over time. 

The usefulness of a concept for an AI should be something like the predictive power of the concept over the computation required to bring it up in a certain environment. If we can find a well-defined system where we see some sort of “information structure” (still not clear in my head) 

It really boils down to that different concept mappings should lead to differential power growth. The power-seeking of an agent should be determined by what concepts it internalises. (What search algorithms it uses, et.c). E.g an ontology should be path dependent and we should be able to narrow down the path through looking at the ways that it is differentially gaining power.

Experiment Design: Predicting this in a NN

I wanted to give a pointer towards potential ways this could be used in interpretability as I believe it is experimentally verifiable. I’m not really fully certain of the full experimental design but here’s a pointer:

Experiment design: Look at a narrow concept such as control over a specific piece of a chess board, say E4 and how that changes over time in different agents to see differential changes

To lessen the compute required we can define larger clusters of predictive variables as larger environments “e(1,1) = {a(1,1), a(1,2) … a(1,r)}” where the variable e is determined as something with high predictive power over the underlying variables. Each environment then optimises other environments to get more predictive power in the future. (Analogy: In high complexity environments, we should choose proxies, e.g deontology or virtue ethics for utilitarianism)

New to LessWrong?

New Comment
10 comments, sorted by Click to highlight new comments since: Today at 5:13 AM

Escaping death = minimise free energy?

I think this question mainly points towards the weird interaction of this with objectives and goals. 

If we set the survival of offspring as the goal, then when we die, we will have no power over the future. This is the same thing as us having max entropy or chaos over the future. So yeah escaping death = minimising free energy with respect to goals that require action in the future.

When we minimise the external free energy, we do it over all future time and not only our current timestep. 

We don't only care about the universe until our current point in time, but we do so for all the future. We're trying to align our model of the world so that all potential timelines have as little external chaos in them as possible. (Incidentally, this is the same thing as minimising the number of potential timelines.)

This is a great post. Let me suggest a few concepts that I think will accelerate your formulation.

In open-systems theory, on way to look at "life" is that it is a self organizing structure capable of evolving to most effectively dissipate free energy. 
Maximum Power Principle is an observation of "effectiveness" as a dissipation strategy under competition. 

Your argument that an agent will use power seeking to minimize the environmental energy, creating more predictability is a natural deduction. The consequence of this is that agents will organize their environment/relationships to gain exponential power, however there is a critical point missing from your argument: the agent's organization and processing requires free energy, therefore leading to environmental carrying capacity. Moreover, the free energy requirements increase in proportion to the overall system power (offset by increased efficiency, but that is limited).

Therefore the agent is capacity constrained, at best coming to equilibrium with the free energy influx into the system. In practice, this requires a global understanding of the equilibrium point, which is not observable from the agent's perspective and so the agent will begin to absorb the stored free energy of the environment, reducing carrying capacity and eventually leading to collapse.

This is why population dynamics are in dynamic disequilibrium.

So while your initial suppositions are right on, you need to include the agent power requirements and energy influx to truly have a complete picture.

There is one point though that threw me for a loop. Why do you think that deception is an advantageous strategy for minimizing free energy generally? This is not the case.

Let's quickly look at the scenarios:

Competing agents are not intelligent - there is no reason to deceive because you just maximize directly through behavior

Competing agents are intelligent, you have limited interactions and the reward function encourages deception - this is the classic prisoner's dilemma and the rational response is to deceive

Competing agents are intelligent but you have repeated interactions, the reward is zero sum - here it gets tricky, because if you deceive too often then there is a high likelihood your competition will catch you in a lie -- afterall the problem space for maintaining a deception is nearly infinite so is impossible to maintain. Once this happens their trust is decreased, along with your ability to maximize your power. You could risk it and occasionally deceive hoping to get away with it, and play innocent when caught, which is a valid strategy but depends highly on the tuning of the other agent. In my experience people who have been taken advantage of in the past develop an analysis that any lie is automatic reason to break the engagement. 

Competing agents are intelligent, you have repeated interactions and the reward is positive sum - this is actually the most common scenario outside of constructed games. In this scenario it is most rational to collaborate and that requires being truthful. How do I square this with the maximum power principle? Easy, you coordinate in-group and compete out-group.  Cooperative game theory is woefully under recognized, but that's because it doesn't have computable equilibria except in highly constrained contexts, not because it's not realistic.

If all agents are attempting to maximize power, the reward is positive sum and they assume potential other agents are the same  - then they will be super rational and at that point the best strategy is to always tell the truth and cooperate, except if you are unsure about if an agent is deceptive and then you should seek to limit the uncertainity around that. 

So to sum: I think your intuition is a good one and minimizing free energy is a great, simple way of generating emergence. You just need to include environmental characteristics such as stored free energy and incoming free energy flux, as well as define the type of game and strategy of other agents. 

This would actually be a wonderful tool because right now there is so much assertion about what AI will become that is only due to arbitrary thought experiment rather than incorporating the rich traditions that have explored these concepts in depth.

Thank you for the insighftul comment! The maximum power principle is very relevant so I really appreciate you bringing it to my attention.

The consequence of dynamic disequilibrium and non-understanding of energy influx is also super interesting. I'm trying to apply this theory to the internals of AIs at the moment and I'm wondering whether the potential internal competition pressures also might collapse for internal systems in AI? (releasing a post that explains more of the relevance of free energy on agent internals relatively soon).

I completely agree with the points that you make about the general games that agents find themselves in and that there's convergence towards in-and-out group behaviour. Local cooperation seems to be optimal in most environments. The reason why I want to develop these theories is that I want to describe potential AGIs with them. I think the specific game being played with a potential AGI is an iterated zero-sum game.

Just as an exaggerated thought experiment, if our game environment was the universe and the resource that we cared about was energy then this would be a zero-sum game with respect to the environment. Now, this isn't the world that we live in at this moment, but if we imagine that we were assuming space travel and relevant technologies, then it would be. We can also take the earth's potential energy threshold and see that this too will become a zero-sum game if we assume a certain amount of planning steps in the future.

If I have an opponent in this game that I could pretend to cooperate with without them noticing until I was too strong for them to stop me, then I wouldn't have to make any concessions when it comes to the entire pool of resources. If we can't see into an AGI and make sense of what it says and it is able to plan for long enough into the future, this is the scenario that is most likely to arise. This is because, from the perspective of the AGI, the universe is a zero-sum game and it will only have to cooperate until it can outcompete us for resources.

As you mention, if you deceive early, then people will become suspicious and so if you want to deceive you would want to wait for the right moment to strike which might be a couple of years after you've been developed. 

Great comment though!


"I'm wondering whether the potential internal competition pressures also might collapse for internal systems in AI?" 

I'm not sure what you mean by this? By "collapse" do you mean will the internal systems collapse as they are in competition over different subgoals, or do you mean will  the competition "collapse" and the internal systems will harmonize? Because the latter is generally what occurs and there is strong evidence that multi-cellular life and then organs arose from a similar process. Reorganizing into symbiosis is the best way to resolve internal tensions and reduce energy needs, which is why it occurs both within organisms (plus) and between them on an ecosystem level. 

Just as a point of consideration, nearly all energy influx that we care about is processed into life through symbiosis (the only exceptions being independent bacteria).

This reorganization can be really violent though, I mean several of the early mass extinction events were directly caused by reorganization and a lot of complex symbiosis arose in response to mass extinction events caused by other means. This just a property of complex systems in general, it's likely our AI systems will grow increasingly powerful and then all of a sudden collapse to a far simpler state where they have greatly reduced capabilities until they relearn on that simpler architecture.

As for what game to play, I mean sure if you make the boundary the universe then it is a zero sum game on a resource level but even then a symbiotic strategy would be most effective to minimise free energy and it only requires a system level of awareness to clearly see this, or alternatively stumbling to run into it. 

What interests me is not that an AI actor would be in competition with life as a whole for resources, but that it could reasonably conclude that humanity is a threat because of our refusal to be symbiotic. And if we open ourselves up to symbiosis then who knows?  I mean less than half our bodies are "human" cells which is an odd formulation since that means each human is inherently a symbiotic ecosystem and the two cannot be separated. 

So in this game what is the boundary not only of the universe but the players?

Sorry for not responding earlier, these are great points and it's taking me a bit of time to digest them.

I can say that with regards to the first point I'm uncertain what I mean myself. It is rather than I'm pointing out that these mechanics should exist in the internals of LLMs with some type of RL training. (Or to be more specific some form of internal agentic competition dynamics where an agent is defined as an entity that has is able to world-models based on action output.)

I will give you a more well thought out answer to your symbiosis argument in a bit. The only thing I want to say for now is that it seems to me that humans are non-symbiotic on average. Also shouldn't symbiosis only be productive if you have the same utility function? (reproductive fitness in ecology) I think a point here might be that symbiosis doesn't arise in AGI-human interactions for that reason.

Yeah no problem! Glad you are taking the time to consider and I look forward to your thoughts.

I'd like to throw in a bit of grist for your thinking around humans and symbiosis. I would argue for most of human history we were consciously symbiotic, meaning we saw ourselves as an extension and in relationship with the environment. Whether that was seeing ourselves as equal with (brother wolf, etc) or above (stewards of the earth) the emphasis was on working with our surroundings to cultivate advantage. What is domestication other than symbiosis?

I won't say that our disconnection from this is exclusively modern, it has existed in other time periods, but it is fair to say that the idea that self-maximizing reproductive fitness is the dominant drive of life is a very recent idea. After all, when Darwin's theory came out it was widely opposed by many for the simple fact that "survival of the fittest" implied that egotistical extremism was natural and surely that couldn't be right. [And of course Darwin himself was never a social Darwinist, plainly saying he was only focused on the fittest meaning "better adapted for the immediate, local environment."] 

And if I were were an alien that simply observed from afar, I would  come to the conclusion that humans are highly symbiotic. Modern humans are incapable of living without extreme reliance on a huge array of other entities, both biological and non, that they are constantly producing, improving, and supporting. 

Ah, you might say, but that's not symbiosis because we are exploiting those things. To which I would reply thusly: first, paratistism is a form of symbiosis so even in the cynical view that we're just exploiting other creatures and each other, we're still symbiotic and even more so now since so many creatures (not to mention our inanimate creations) are incapable of survival without us. But even beyond that, our relationships are still mutualistic in the sense that we are greatly increasing quantity of life in the organisms we are symbiotic with.

Much too well actually, since domesticated mammals outweigh wild ones 10:1. You could say we do far too much symbiosis.

There is a broader point I'm making here, which goes back to whether the game is zero or positive sum.  It's tempting to say that AGI will have no need for us because it has a different utility function. But does our utility function rely on bees? So many cows, sheep, goats? Dogs and cats as companions? Sparrows, pigeons, so on and so forth...they provide something we are incapable of producing in ourselves and that is enough for us. What will the AGI find itself lacking in?

Not that I'm saying we will become domesticated animals in relation to AGI, I am merely drawing parallels that life is nuanced and conditional. 

Thermodynamics theories of life can be viewed as a generalization of Darwinism, though in my opinion the abstraction ends up being looser/less productive, and I think it's more fruitful just to talk in evolutionary terms directly.

You might find these useful:

God's Utility Function

A New Physics Theory of Life

Entropy and Life (Wikipedia)

AI and Evolution

I understand how that is generally the case, especially when considering evolutionary systems' properties. My underlying reason for developing this is that I predict using ML methods on entropy-based descriptions of chaos in NNs will be easier than looking at pure utility functions when it comes to power-seeking. 

I imagine that there is a lot more work on existing methods for measuring causal effects and entropy descriptions of the internal dynamics of a system.

I will give an example as the above seems like I'm saying "emergence" as an answer to why consciousness exists, it's non-specific. 

If I'm looking at how deception will develop inside an agent, I can think of putting internal agents or shards against each other in some evolutionary tournament. I don't know how to set up an arbitrary utility for these shards, so I don't know how to use the evolutionary theory here. I do know how to set up a potential space of the deception system landscape based on a linear space of the significant predictive variables. I can then look at how much each shard is affecting the predictive variables and then get a prediction of what shard/inner agent will dominate the deception system through the level of power-seeking it has.

Now I'm uncertain whether I would need to care about the free energy minimisation part of it or not. Still, it seems to me that it is more useful to describe power-seeking and what shard/inner agent ends up on top in terms of information entropy. (I might be wrong and if so I would be happy to be told so.)

I'm weak on the math of free energy. It would be helpful with something that goes through the derivations for a simple case.