Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.


This piece seeks to explore an interesting way of defining intelligent systems such that we can theoretically quantify their general intelligence. From this, further tools and ideas for comparing these entities could be developed. The definitions are not meant to be philosophical truths, rather they are meant to be useful tools that will allow us to analyse and gain insight into these systems and how they relate to one another. At least that's the hope, failing that they can perhaps at least provide some food for thought.

This post is meant to be accessible to non-technical readers so some terms may be explained to a level of detail unnecessary for people familiar with machine learning.

Desirable Properties

We begin by identifying several desired properties that would increase the utility and robustness of our framework, giving us something to aim at.

Sufficient: If our definitions relied upon, or referenced, things that are poorly defined themselves, we would just be moving the problem back a step and not actually gaining any insight.

Measurable: Intelligence is a broad spectrum, this especially visible in the natural world. A good definition would reflect this and give us a continuous measure of intelligence that allows sensible comparisons.

Implementation Independent: It's easy to compare somethings capabilities to humans in order to ascertain their intelligence. We want our definitions to be free from bias towards any particular implementation or version of intelligence, so that it can recognise intelligence which operates in a way unfamiliar to us, or in a way we don't understand.

Minimal Grey Areas: Many definitions could leave large grey areas on boundaries between classifications, or not make sense when applied to domains they were not designed with in mind. This should be avoided.

Useable: Sometimes a seemingly 'perfect' definition is infeasible to actually apply, and so is of no practical use. A definition which is infeasible to theoretically calculate, could have a method to estimate it. Quantifying how reliable or accurate the estimates are would also be useful.


A classic definition of an agent is something that interacts with an environment, choosing an action (or actions), to achieve some desired goal. To make this rigorous, we will define an agent in our framework as something that produces an action based on the state of its environment. The action will be an effect on the environment. In a mathematical sense an agent is a function from the environment state to an effect, the effect is then a function from one environment state to another. Both functions could be stochastic (involving randomness) or deterministic, depending on the environment and the agent.

Describing something as an agent is something we do as part of our framework in order to analyse it. There is no physical property that determines if an object is an agent or if it's not - anything can be an agent if we want it to be.


The environment includes everything that could affect our agent, and everything our agent can affect. Arguably this is always the entire universe though this is not very helpful and so reasonable simplifications should be made. The environment can also include the agent and its internal state, especially if our agent is able to read or modify its own state.

Environments will typically be multi-dimensional. If they represent real-world environments then there will be up to 4 dimensions, 3 for space and 1 for time. When we define an environment, we will also define something we will call 'The Prime Dimensions' (PDs). The position of the environment in the PDs cannot be affected directly by the agent. For real-world-like environments the PDs includes time, for classification tasks it is the examples (i.e. each point in its space represents a single example). Many reinforcement learning environments will have the PDs be made up of attempts and time within each attempt.

Often we can partition the environment between agent inputs and agent outputs. This is common though not true in general. Some environments, combined with an agent, will form 'Partially Observable Markov Decision Processes' (POMDPs), though without the reward component. Essentially this means that all possible information about the environment is encoded in its current state, that the agent may or may not have full access to. Note all POMDPs have PDs that include time.


Actions are some effect our agent produces that mutates the state of the environment. Actions could be produced deterministically or with some degree of randomness. The possible actions agents can have are quite broad in nature, and often lend themselves to be simplified. For example, limiting the actions of mechanical devices to changes in the acceleration of their idealised degrees of freedom.


Unlike the classical view of agents, we do not consider our agents to have inherent desired goals. The idea of a goal is still useful however. We can define a goal as a function that takes in information about the agent and environment, and then outputs a value that represents how well the agent is achieving that goal at that point in time (larger is better). We require that the integral (or sum in the discrete case) of the goal function over the prime dimensions converges, and is bound by 0 and 1. Note that goals are arbitrary and always outside the environment - the agent cannot see them or modify them.


We will analyse 3 possible agents and explain how they fit into the framework. They are a weather vane, an image classifier, and a bot that plays pong. Note that the technical implementations of the last 2 agents will be irrelevant to our analysis.

Weather Vane

The weather vane's environment is the state of the atmosphere close to itself as well as its own spatial position and orientation. Our prime dimension is time. The weather vane's action could be idealised as its angular acceleration around its axis, though if we were to consider it possibly breaking we would need to extend this. The goal of the weather vane could be qualitatively to point in the direction of the wind as fast as possible. We could mathematically express this as where is a free parameter.

Image Classifier

The image classifier takes in an image and produces a probability distribution over some set of classes. The environment are the images given to it, and its produced distributions. The prime dimension is the set of images. Our classifier's actions are its produced classification distributions. As a goal, we'd want the classifier to produce probability 1 for the correct classification and 0 for all others, for each example. We could express this as where represents the image data and is the value of the prediction for the image. There are total images.

Pong Bot

The pong bot's environment could be information on the location and velocities of the balls and paddles, it could also be the last few frames of the game rendered to the screen. It would also include the inputs to its 'player'. The environment would span every game of pong the agent plays, that and the time during each game would make up the prime dimension. The pong bot could produce a discrete action of up or down for each input frame of the game. The pong bots goal is winning pong, mathematically a good goal function might be . It plays total games with being 1 if it wins game .


The intelligence of an agent is often defined as the ability for it to achieve its goals. In our framework the goal is something we impose on the agent rather than something it holds itself, so we will define 2 types of intelligence, specific and general.

Specific Intelligence

The specific intelligence of an agent relates to a specific goal. An agents intelligence with respect to that goal is the expectation of the integral of the goal function over the prime dimension. Taking an expectation covers cases where any part of our system is stochastic. From the requirements of the properties of goal functions, this will be a value from 0 to 1, where 1 represents the agent being 'perfect' for that goal.

General Intelligence

Under our analysis so far, our weather vane would be incredibly intelligent - it's very good at achieving the goal it's designed for, pointing at the wind. Are weather vanes super intelligent? Of course not, this is where generality comes into play. We want to capture how capable our agent is at performing a variety of tasks - its intelligence over a variety of goals. This will allow us to separate weather vanes, which only point in the direction of the wind, to a game playing DQN algorithm that could become very skilled at a large number of simple games.


In contrast to other formulations, our agents don't have ways of directly receiving and interpreting goals. Assuming that they did (as they would in many machine learning paradigms), would violate implementation independence. Instead, we generalise by modifying an agent (or its environment) in order to try and increase our agents specific intelligence with respect to our desired goal. This could be as simple as giving the agent a specific initial input, or it could be training it with a new set of data. This has the nice property of making clear the outer alignment problem. We can never actually tell agents what to do, only modify it so what it does hopefully becomes closer to what we want.

Clearly, the more modification we are allowed to make, the greater impact this will have on our agents capabilities. At the extreme we could completely redesign our agent from the ground-up for each individual goal. We will assume we have a way of quantifying the magnitude of a given modification, i.e. how much it changes the agent. This will allow us to counteract this problem.

We can use this idea to explore how capable agents are, subject to modifications of a certain magnitude. Now the differences become clear. Weather vanes would require extreme modification to do anything other than point in the direction of the wind, whereas a theoretical AGI robot would only need to be told to do something and it could probably achieve high performance on lots of goals.


We now have the tools to quantify the general intelligence of an agent. We will consider how well our agent performs on every possible goal, under every possible modification. This will require 2 more auxiliary functions to make work: a modification-magnitude importance function and a goal importance function. The algorithm is given below for a deterministic agent / environment, to convert to a stochastic one you simply need to take expectations where necessary.

General Intelligence Quantification

  • Record the agents specific intelligence with respect to every possible goal in every possible environment
  • Make every possible modification to the agent/environment, each time repeating step 1
  • For each goal, take the list of modifications made and specific intelligence achieved and put it in order of increasing modification magnitude
  • Remove all duplicate results - equal modification magnitude and specific intelligence
  • Remove all results where the modification resulted in a lower specific intelligence than a modification that was of a smaller magnitude (such that our list of modifications that is in order of increasing modification magnitude is also in order of increasing specific intelligence)
  • Convert the list of modifications and their magnitudes / rewards into an increasing function from magnitude of modification to reward using a zero-order hold interpolation where necessary (Resulting function should be an increasing function from reals to reals)
  • Multiply this function with our modification-magnitude importance function (this represents us caring less about our agent being able to do arbitrarily well with arbitrarily large modification)
  • Integrate this function from 0 to infinity to obtain a single value that represents how well the agent performs at that goal given varying levels of modification (performance value)
  • Multiply this by our goal-value function output for this goal (to represent that some goals are 'worth less to be good at')
  • Repeat this for every goal and take a summation to give us the final value for our agent - its general intelligence

Clearly there are some complications with this strategy. Firstly we need to define the three auxiliary functions. This will be subject to biases and assumptions. We require that the integral of our modification-magnitude importance function converges to a finite number, for example exponential decay.

Even given these functions, the actual calculation is still intractable. It requires us to consider possible infinities of modifications across possible infinities of environments that contain yet more possible infinities of goals. We can however define a small subset of these and perform the calculation in order to obtain an estimate of the agents general intelligence. An example of doing this is given in the following section.

Practical Calculation Approach

Lets imagine we have some computer algorithms (agents), and we wish to see investigate how their general intelligence compares. We could even include a human agent in this, though they are subject to the same restrictions as all other agents (no direct access to the goal). We'll take some number of environments, each with various goals specified for them. We hope that these goals, across these environments, provide a somewhat balanced range of tasks for the agents to complete. The way to communicate the goals to the agents is via a text file. We'll refer to those writing the agents and their modifications as the testers. The process of calculating their general intelligence is as follows.

Practical General Intelligence Quantification

  • The environments are revealed to the testers and each agent is implemented with a general idea of the types of goals that may be present, but no knowledge of any actual ones
  • For each goal, the tester may write various strings of their choosing into the text file before the agent begins its attempt at the task, all attempts are recorded
  • If any part of the environment or agent behaviour is stochastic, a suitable number of trials are performed and an expectation is estimated
  • The modification-magnitude is given by the file size and these values along with the achieved specific intelligence for that task under that modification are recorded and processed as per the general algorithm
  • Using a modification-magnitude importance function of exponential decay, we integrate to get the performance values for each goal for each agent
  • Average these across all goals for each agent to get approximated general intelligence scores for each agent

Clearly this won't be perfect and care would need to be taken to make sure the writers of the algorithms have a sufficient lack of knowledge about possible specific goals but still know what they could be in general. Modifying a text file in the environment is a small subset of the possible modification space but it has the nice property that our calculated intelligences reflect how powerful our agents are at leveraging information to achieve a goal. Our approach also only covers a small subspace of the environment-goal space. It does however, yield a way of comparing our algorithms (and humans) to not only see which ones are more generally intelligent, but even quantify the differences!

Comparisons & Connections


So what are AI and AGI under our framework? AI is a bit of a pointless term - weather vanes can be considered highly intelligent artificial agents. As for AGI, we can think of this as simply an artificial agent with high overall general intelligence, perhaps near or beyond human level. It perform well at lots of goals in many environments, with minimal modification required. Nothing really matters, especially how it's implemented. It could be a neural network, a circuit, a mechanical device, a brain simulation, etc.

Explicit Optimisers

Often, intelligence agents are formulated as optimisers. They have some reward function they are trying to maximise (or loss to minimise). Machine learning methods are based on optimisers and we will explore how those paradigms fit into this framework. As a general point, training or pre-training could be carried out before testing the resulting agent on various goals. Lack of knowledge of the goals before training would still apply in this case.

Supervised Learning

In supervised learning, we are given pairs of training data, , and try to learn the relationship between them such that we can predict the from an unseen . Our primary dimension in the environment is the set of examples. Goal functions would be to produce certain 's (or get as close as possible according to some measure) from a set of certain 's in the environment. These , pairs are known as the test data. Is the training data part of the environment? We will consider three cases.

The first is including training data in the environment. Here the agent is the learning algorithm, it uses the training data embedded in the environment to learn the relationship and then predict on the test data with the goal function measuring performance. Changing the training data allows us to learn different relationships and thus optimise for different goal functions. However lots of training data is usually required, thus the modification magnitude is very large. The advantage is our algorithm is very general and performs well for all supervised learning cases that we can embed examples for.

The second is none of the training data being part of the environment. Here the agent is the learnt model. It performs highly at whatever task the training data corresponds to but cannot learn different relationships, without modifying the learnt agent itself. This represents a highly intelligent agent at one specific task but not very good at any others.

Finally, consider a pre-trained model. It has already seen some data, but can be fine-tuned on extra data embedded in the environment. This mixes the pros and cons of the previous two methods.

Reinforcement Learning

In reinforcement learning (RL) we receive sequential environment observations, . We take an action after each new observation, , and receive a reward . The goal is to learn which action to take, based on the current history of observations, that maximises the total reward received overall. Sometimes the environment is reset and each period of time between each reset is referred to as an episode. The prime dimensions would thus be episodes and the time within them.

RL bears resemblance to our formulation, but we must handle with care. The goal functions could be almost identical to reward functions but there is a key difference. The agent has access to the reward function, but not the goal function.

If our approach is to directly optimise on the goal function, we need to convert it to a reward function. For reasons that boil down to RL is hard, this is non-trivial. This conversion represents the essence of the outer alignment problem in RL. In this method, like in supervised learning, the agent is the algorithm. Optimising a reward function also has the disadvantage that for each new goal we test the agent on, it needs to learn how to achieve that goal. If this takes a long time, this could be suboptimal if the goal functions specify we must achieve our task quickly. One advantage to the supervised learning case is we only need to embed the new reward function in the environment/agent as oppose to embedding a whole dataset.

What if we want our agent to be able to perform tasks quickly? There are three approaches. The first is to train it on lots of objectives separately, then use the environment to embedded some way of switching between learnt behaviours. Like in the supervised case this leads to very poor generality. A better way would be use a pre-training approach. Now our agent might only need a few (or even zero) training attempts to learn how to complete each goal. The third method would be to learn parameterised behaviours via RL and then specify the parameters once the goal functions are known and in one episode perform highly at the goal.


Firstly, many may point out that in the methods explored here, there's an optimisation taking place at the modification (design/input) level. However, when we are performing our modifications on the agent and its environment we have to remember that it is only an optimisation in the practical sense, as in theory we would apply every possible one. That out of the way, here are two possible agent paradigms that don't use optimisation explicitly.


Consider a device that can change its behaviour based on altering it (e.g. switches, dials etc.). We could consider configuring it for each task as a modification, though how to quantify this is non-trivial compared to agents which can receive text/audio information directly. Extending this to the extreme, this approach represents building or configuring a specific machine to solve each goal, with modification magnitude reflecting the size/complexity of the machine. These machines could be mechanical, electrical, a mix of the two, or something else entirely.

Bespoke Programs

We could create an instruction set of all possible actions. If there's no uncertainty, then which action to take at each time-step (or each point in the prime dimensions) could be encoded as a list of strings in a text/audio input. Obviously for complex tasks this could be quite large, hence large modification magnitude and a low performance value. For environments with uncertainty, conditional and branching statements in the instruction set could be used. In the extreme, this approach represents programming a specific solution for each goal, with modification magnitude reflecting the size of the program. The agent is the interpreter/compiler and executor of the program.


In terms of human agents, we have to make sure the goals we test them on do not significantly conflict with their own internal human goals and thus give false impressions of capabilities. Modification can be carried out by inserting information into the environment (e.g. text/speech).


By considering agents without inherent goals that simply interact with environments, we remove any reliance on what agents are or can do. In order to measure their intelligence, we consider how their actions affect a given goal. This gives the agents specific intelligence for that task.

In order to generalise beyond specific instances of agents, we modify the agent or environment in order to maximise the satisfaction of the goal. By considering all possible modifications, and weighting them based on how large they are; we compute a general intelligence for that agent. This serves as a measure for how sensitive an agents capabilities are to modifying it.

Our definition would require us to consider all possible modifications, for all possible goals in all possible environments. This is unfeasible. By limiting what we consider, we can calculate an estimate for the agents general intelligence. In the case of modifications being information added to the environment, general intelligence becomes the agents ability to leverage information to perform a task.

New Comment
6 comments, sorted by Click to highlight new comments since:

I fear that measuring modifications it's like measuring a moving target. I suspect it will be very hard to consider all the modifications, and many AIs may blend each other under large modifications.  Also it's not clear how hard some modifications will be without actually carrying out those modifications.

Why not fixing a target, and measuring the inputs needed (e.g. flops, memory, time) to achieve goals? 

I'm working on this topic too, I will PM you.  

Also feel free to reach out if topic is of interest.

Yes, it's still unclear how to measure modification magnitude in general (or if that's even possible to do in a principled way) but for modifications which are limited to text, you could use the entropy of the text and to me that seems like a fairly reasonable and somewhat fundamental measure (according to information theory). Thank you for the references in your other comment, I'll make sure to give them a read!

Other useful references:

-On the Measure of Intelligence 

-S. Legg and M. Hutter, A collection of definitions of intelligence, Frontiers in Artificial Intelligence and applications, 157 (2007), 

-S. Legg and M. Hutter, Universal intelligence: A definition of machine intelligence, Minds and Machines, 17 (2007), pp. 391-444. 

-P. Wang, On Defining Artificial Intelligence, Journal of Artificial General Intelligence, 10 (2019), pp. 1-37.

-J. Hernández-Orallo, The measure of all minds: evaluating natural and artificial intelligence, Cambridge University Press, 2017.


You might be interested by Shane Legg's thesis

Thank you, this looks very interesting

If some quantification is correct I expect it to have fewer free parameters.