Decision theory is being used as the basis for AI safety work. This currently involves maximising expected utility of specific actions. Maximising expected utility is woefully inefficient for performing very rapid paced unimportant decisions, which occur frequently in computing. But these fast paced decisions will still need to be made in a way that is purpose oriented in an AI.

This article presents an argument that we should explore meta-decision theories to allow the efficient solution of these problems. Meta-decision theories are also more human-like and could have a different central problems to first order decision theories.

Decision theory tends to be the domain of economists, mathematicians, ethicists and other philosophers. However now it is being used in arguments about artificial intelligence and how to make it safe. If we are to try to implement an AI that follows any specific decision theory, it is brought into the real world domain of the computer. Now it has to deal with slightly less abstract world of bits and bytes. We have to make decision theories that take notice of the costs and consequences of making a decision.

AI is often seen through a dual lens, as both an agent making reasoned decisions about goals and as a computer able to process data and act at astounding speeds. These views have not entirely been integrated and this needs to be done for safe AI work.

AIs will still have to send data over the network and it will display images on a screen. These boring things are still up for optimisation, they still need to be decided upon. We will want these actions to be correctable if they are going badly wrong.

Decisions at 10Gbps

The example that will be followed through is networking. Consider a simplified network interface where the possible actions are outputting a 0 or a 1 across a wire.

We have an AI in control of this network interface, part of it’s job to send a file as quickly as possible to the other side. It has a 10 Gbps ethernet connection. It gets the optimal utility if it can send a 10 Gb file in 1 second. It also has to pick and chose which files to send, some are more important that others. Imagine doing an expected utility calculations for each bit. If you can have a machine with 300 GIPs (top of the range CPUs are in that order), and under ideal conditions could only use 30 instructions to do an expected utility calculation for each bit. This might work for a dead simple expected utility function that just checks whether a 0 or 1 matches the current bit of the current file it is sending and if it matches the utility is 1 and if it doesn’t  match the utility is 0.

However we are expecting better things from our AI, it should be able to figure out if it should interrupt the current file upload and start a different one, because that file has become more important. 30 instructions is slim pickings for something that might want to do a complex simulation of the state of the world to figure out what it should do. An option to try and optimise things might be to batch up bits and do one utility calculation per set of bits. Batching things up into sets of 64bits doesn’t seem to gain you much if any, while you divide the number of utility calculations you need to do by 64, each decision has 2^64 different possible actions instead of only 2 and you need to calculate the expected utility of each action.

No one is seriously proposing to do expected utility calculations for control of a network card, but then no one is talking about what it means to intelligently control a network interface at all.

So what do we do about it? Currently to achieve this task we would have a dumb program that just copied bits across the network and then a higher level program that modeled the world and interrupted the low level program to change it’s behaviour to copying a different file, if it found it would get more utility. So divorcing utility from the low-level code entirely.  This seems, at first glance, like an elegant solution.

However now we add a wrinkle, we want to stream some video from a webcam for an important conference call, this gains the AI some utility. We have to decide if streaming from the webcam or sending a file gets us more utility or multiplexing the streams of each. This brings us right back to where we started, making 10 000 000 000 decisions per second .

This would be solved in normal computing by implementing quality of service (QoS) software that makes decisions about sending the data based on its importance. And our expected utility maximiser could reason about the expected utility of different QoS settings. However the QoS software has some overhead, it takes more than 30 instructions per bit on average to do QoS and process the data, so it degrades the maximal throughput rate.

So now we come to another decision, should we use QoS software or not. Note that this is not at all about an action, it is purely about an internal change to the system, which program has control of the network interface. To rescue the correctable aspect of this, we can look at the expected utility of having QoS program have control of the system and we can predict the expected utility at having a simpler program with less overhead in control of the system. If we expect to have lots of contention then we want QoS, if we expect to be monomaniacally doing one action then we should have no QoS. We are making a decision about how decisions are made.

People do bypass useful networking features in the kernels to get better throughput rates, see for example this cloudflare blog so this is not an unmotivated example.

Making this choice about to have QoS software or not doesn’t seem interesting; unless you see that it is not an isolated example. The same arguments can be made for all processing, power, memory and memory bandwidth usage in the computer. You need to be making choices about the expected utility of which programs get allocations of all these things, when they might impact the expected utility.

Enter Meta-decision theory

This view leads to an interesting side effect, our decision theories are encoded in programs. How much memory and processing power should we allocate to them? Some decision theories might lead to less utility purely on how they function, we might not get an answer from them in our lifetimes if they are trying to do a formal proof of something unprovable. So perhaps we should allocate control of making decisions to the decision theory that is expected to net the most utility for the system. This is a form of meta decision theoretic thinking as defined in Smokers, Psychos, and Decision-Theoretic Uncertainty by William Macaskill. Resource constraints gives us different motivations for wanting to choose between decision theories, and we would want decision theories to have control over the system for longer periods of time, rather than single actions. But this work shows that their might be benefits above and beyond the resource constraint arguments.  I’ve not yet decided if his preference for Causal Meta-decision theory carries over to the scenarios I’ve been talking about.

As I’m not a decision theorist by training I stray even further from the fold, I’m interested in using meta-decision theory systems that can pick non-utility maximising rule sets for making decisions. This might allow me to program or train the computer system into having deontological ethics in control of the certain types of decisions, so that the system would not wirehead, lie or murder or prevent itself from being turned off.

If we could have these rules and could make sure that they persist (by making sure they achieve the most utility) we could solve some of the longstanding problems of first order decision theories as they relate to AI.

More human decisions

Another interesting feature of meta-decision theoretic systems, is that they share the stance we as humans apply to decision theories. We adopt design different decision theories to use, we make and keep rules in different circumstances. So too could meta-decision theoretic computer systems pick deontological rules for consequential reasons.


AI control theory should not just use rely on abstract maths. Real world computing concerns can and should inform the way that decisions are made. Meta-decision theories are currently under explored and should be favoured when trying to build systems that work in the real world in resource constrained situations.

There is so much work to do here. There are many open questions

  • We need programs to make decisions about control. Where do these programs come from?
  • How often should decisions about control should be made?
  • How should they be made?
New Comment
8 comments, sorted by Click to highlight new comments since: Today at 4:30 AM

I think the problem that the idea of doing special meta-reasoning runs into is that there's no clear line between meta-level decisions and object-level decisions.

"Will changing this line of code be good or bad?" is the sort of question that can be analyzed on the object level even if that line of code is part of my source. If you make an agent that uses a different set of rules when considering changes to itself, the agent still might create successor agents or indirectly cause it's code to be changed, in ways that follow the original object-level rules, because those decisions fall under object-level reasoning​.

Conversely, I think that means that it's useful and important to look at object-level reasoning applied to the self, and where that ends up.

I think this line of reasoning is the distinction between the normal computer control problem and the ai control problem.

If you make an agent that uses a different set of rules when considering changes to itself, the agent still might create successor agents or indirectly cause it's code to be changed, in ways that follow the original object-level rules, because those decisions fall under object-level reasoning​.

Working within the ai control problem context.

The decision to allow the agent control of the object-level decisions was done on the meta-level, so it was the best option the meta-level could find. If the meta-level was stronger it would only allow object-level decider that would respects its meta-level decisions. So you are making the assumption that the object-level is stronger than the meta-level in its powers of prediction.

In the context of my work on agoric computing that is probably true. But it is not true for all meta-level decision theories.

In what ways do you see meta decision theory as different from the notion of alignment? At the currently level of detail they sound to me like they are describing the same thing, although alignment is perhaps less pre-committed to decision theory as a solution even if that's what alignment research is focused on now.

Maybe the intended interpretation is that they are functionally equivalent, but since you didn't specify I wanted to clarify.

Thanks! I should put links to the previous works in the what is turning out to be a mini-sequence.

The first one is defining the normal computer control problem and then decomposing.

The short answer is I'm trying to solve an easier problem (a control problem that assumes no super intelligence in the computer) that is related in the hope that it will give me insights into both. The insight that I present here is that formal decision theoretic decisions are too slow to realistically control things like networks and meta-level decision theories might provide a way forward. I think this insight applies to both problems.

Aside: I think the AI alignment and AI control problem might be equivalent. But I could see arguments for AI alignment being solved by something acausal in nature, i.e. CEV is not controlled by the actions of people but is aligned to what the people want.

Claim: Meta-decision theories seem more human-like in their approach to decision theory than first order ones

Reply to this comment if you want dispute this claim

Claim: Meta-decision theories (making decisions about how to make decisions) are a fruitful research path for AIs that consider efficiency of operation

Reply to this comment if you want dispute this claim

Claim: What to bits to put out over a network should be a valid thing to make a decision about, and should be covered by any decision theory that we use for AIs

Reply to this comment if you want dispute this claim

Claim: Considerations of efficiency should be bought into the formulation of decision theory when discussing AIs (but not math)

Reply to this comment if you want dispute this claim