Training Regime Day 20: OODA Loop

by Mark Xu 7 min read24th Apr 20201 comment

21


Introduction

The OODA loop is the cycle of observe -> orient -> decide -> act. This was developed by Colonel John Boyd as an abstraction of the rapid decision-making that occurs during combat. The loop contains four stages:

Observe: Take in information from the environment.

Orient: Determine the set of possible actions.

Decide: Choose which action is the best action.

Act: Do that action.

Examples

(feel free to skip if the general concept seems to make sense to you)

These examples are going to be in rather artificial environments. In the real world, it is not so clear which stage of OODA you're in at any given moment. Nonetheless, I claim that it's a useful abstraction:

1. [Note: I suck at chess] In chess, you observe by looking at the chessboard. A naive way to orient is to consider all legal chess moves. However, if you consider too many moves, then deciding which move is best becomes intractable. An experienced chess player will thus orient strategically by only considering a small set of moves. To decide which move is best, chess players will typically run some sort of tree search. Acting in chess is trivial. In typical chess, players are not extremely constrained on time, so the OODA loop is slow.

2. In Magic: the Gathering, you observe by looking at the board state and the cards in your hand. Similarly to chess, you cannot consider all possible moves, so you must strategically orient based upon a set of moves that experience tells you are likely to be good. You decide by a set of heuristics honed by long hours of playing, possibly by conducting a small amount of search or doing explicit probability calculations. Action in Magic: the Gathering is also trivial. Magic: the Gathering is more constrained on time than chess, but the OODA loop is still relatively slow.

3. [Note: I haven't played Starcraft in many years] In Starcraft, you observe by looking at the screen/minimap/other status indicators. You consider a select set of moves governed by your knowledge of strategy, what your opponent is doing, etc. You decide what to do based almost entirely on well trained heuristics/intuition. Acting is sometimes constrained by your ability to click properly, but generally you won't even consider actions that you know that you can't execute properly. Starcraft is played in real time - the rapidity at which you can execute your OODA loop matters because the other player is changing the environment as you think.

4. In climbing, you observe by looking at the wall and feeling the current position of your body. You consider a select set of motions informed by your earlier observation of the wall. You decide what to do based on some strange knowledge about how bodies move and how hands work. Acting requires that you move your muscles in the ways that you imagine in your head, which sometimes doesn't work, resulting in falling. While you're on a climbing wall, your strength is constantly being used to hold you on the wall - the speed of the OODA loop matters because you are part of the environment and your strength is changing over time.

Failures

I claim that the OODA loop provides a good taxonomy of 8 possible ways that agents fail to accomplish their goals in dynamically changing environments. The 8 possible ways are failures at each of the OODA steps and failures to properly transition between the steps.

Failure to Observe

A failure to observe is a failure to gather accurate, relevant information from the environment.

Examples:

1. I suck at chess. I suck so much that sometimes when I look at the chessboard, I don't see one of my opponents pieces.

2. When climbing, sometimes the hold is really dirty and I don't see it.

3. When playing MtG, sometimes I don't realize that something on the board is happening.

In general, if you have the wrong model of the environment, you will observe incorrectly.

Failure to Orient

A failure to orient is a failure to select a tractable set of high-quality actions. There is a classic bias-variance tradeoff here. If you have too many actions, your decision procedure isn't noiseless, so you'll have higher variance in outcomes. If you have too little actions, then the best action you select might not be very good.

Examples:

1. I suck at chess. Sometimes I have no idea what I'm doing, so I just try to consider all possible actions. This doesn't work because I don't know what I'm doing, so I end up taking basically a random action that isn't obviously terrible.

2. When playing MtG, if you don't know the rules of the game, sometimes you don't realize that there are certain actions that you can take.

3. When climbing, if you think you're stronger than you actually are, you'll consider actions that look good but you won't be able to do.

In general, if your model of your own agency in a given situation is wrong, it will cause you to orient incorrectly.

Failure to Decide

A failure to decide is a failure to select the best action in the action set you oriented towards.

Examples:

1. I suck at chess. Sometimes I think that capturing a piece locally is good, but don't realize that I'll lose the game 4 turns later.

2. When playing MtG, sometimes I'll think that the rules work one way so I'll think a play is good, but then it turns out the rules work a different way and I am sad.

In general, if your model of how you'll affect the environment or your value function over the environment is flawed, you will decide incorrectly.

Failure to Act

A failure to act is a failure to do the thing that you determined was the best action among all actions you considered given what you know about the environment. This failure is maybe the most "human" failure of them all. I find this failure the most frustrating because I could have just... done it?

Examples:

1. When trying do life properly, I observe what I want, orient towards the tasks that can get me what I want with what I have, decide on the best task to do, then spend an hour on reddit.

2. When climbing, sometimes my flesh betrays me and I cannot execute the move that I thought I could do, so I fall.

Failure in Observe -> Orient

A failure to go from observing to orienting is a failure to know when you're supposed to act. Most of the time, direct action on your part is not required, so people spend a lot of time in the observation stage. Sometimes, something special happens which requires you to act but you don't realize, so you don't act.

Examples:

1. I observe that a cup of water is about to fall. I watch passively as it falls. I could have easily caught it, but didn't realize that catching the cup was a thing that I could do until after it already fell.

2. I observe that some people seem to dress nicer than other people. I fail to orient towards the possible ways that I can dress nicer, even though this is one of my desires.

3. In MtG, I observe that my opponent is doing something that I don't want them to do and that I can stop it. I fail to orient towards the action of actually stopping it.

Failure in Orient -> Decide

A failure to go from orienting to deciding is a failure to realize that you have to stop searching for actions at some point.

Note: this failure only occurs if there's some sort of time pressure, which there basically always is. However, this means that increased time pressure makes this failure more likely.

Examples:

1. A human is trying to optimize their life. They spend all their time searching for possible optimizations and never decide which optimization is best.

2. In climbing, sometimes I hang on the wall thinking of possible moves to get to the next hold. Then my arms get tired and I fall.

Failure in Decide -> Act

A failure to go from deciding to acting is a failure to ever stop trying to figure out the best action. I think that there is a failure mode among rationalists where they spend too much time trying to figure out what the best thing to do is and not enough time actually doing things.

Note: this failure only occurs if there's some sort of time pressure, which there basically always is. However, this means that increased time pressure makes this failure more likely.

Examples:

1. In Starcraft, I'm trying to figure out what the best move to do is. I spend too long doing this and then I lose.

2. I'm torn between two potential lovers. I spend a long time trying to figure out the various benefits of each person. They both leave me for taking to long to decide.

2. https://xkcd.com/309/

Failure in Act -> Observe

A failure to go from acting to observation is a failure to realize that no plan survives contact with reality. There is a common failure mode where people only OODA once, deciding upon a course of action and then pursuing it, without realizing that the environment or their values might have changed since the decision.

Examples:

1. I once wanted to make mashed potatoes with a sous vide. If you don't know, a sous vide machine will cook potatoes without introducing extra water, so you can put in a bunch of butter and make the mashed potatoes taste super good. Our sous vide machine broke, so we had to cook the potatoes in a way that introduced water. However, we failed to re-observe, so we just put in the same amount of butter as we were planning on doing with the sous vide potatoes. Butter potato soup is not a great food.

2. In strategy games, sometimes I decide on a high level strategy at the beginning of the game and commit to it. During the game, my opponents figure out what I'm doing and begin to counteract. Often times, I will not re-observe and choose a different strategy, instead barreling on and losing the game.

3. In MtG drafting, a common mistake is to commit to a drafting strategy and fail to re-observe when the strategy is going poorly.

Getting Inside

In combative situations, there is more than one OODA loop going on. Sometimes, one of the OODA loops is faster than the other one. We refer to this situation as "getting inside their OODA loop." Such a situation results in what I call "OODA dominance." I claim that there are qualitatively two types of OODA dominance that can occur.

Fast

In fast OODA dominance, one of the agents is able to execute an entire OODA loop before the other agent is close to completing their OODA loop. This means that the faster agent can observe what observation the other agent made, then execute an entire OODA loop so that the slower agent is acting in an environment that it hasn't even observed. Crucially, the faster agent can act in a way that makes the slower agent's action completely useless or even counter-productive.

Examples:

1. In fencing, if you're much faster than your opponent, you can win without ever losing a single point by just waiting for your opponent to do anything, then just applying the appropriate counter-moves faster than they can respond.

2. In corporate world, if you're part of a nimble start-up, you can potentially outmaneuver larger corporations with much more resources because you can respond to the larger corporations maneuvers very quickly.

3. In the stock market, nimble generalists might be able to act faster than comparatively slow hedge funds and make money on expectation.


Slow

In slow OODA dominance, one of the agents OODA loops is only slightly faster than the other. This means that when the slower agent acts, their information about the environment will be slightly out of date, making their actions slightly less effective. I tentatively claim that this situation is interesting because the slower agent has the chance to shave time off their OODA loop to get on a more level playing field.

1. In high level Starcraft, players will execute attacks that are negative in terms of the games resources, but cost their opponents more OODA cycles to deal with than they need to use to execute the attack.

2. If you're fencing a better player, often times it's better to focus less on trying to execute fancy maneuvers and more on simple retreats and attacks, allowing you to prevent your opponent from getting inside your OODA loop.

Exercise

Think of 5 times when you've "failed" and try to categorize the failures in terms of OODA loops.


Note: I'm pretty sure you can get inside your own OODA loop. I think this concept might be important, but I don't have time to think it through properly.

21