Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

Acknowledgements:
This research began during the SERI MATS program, under the joint mentorship of John Wentworth, Nicholas Kees, and Janus. Thanks also to Davidad, Jack Sagar, and David Jaz Myers for discussion.

Abstract:
I think that there is a uniform correspondence between flavours of uncertainty and monads taking state-spaces to belief-state-spaces, for different characterisation of belief. In this essay, I describe this correspondence explicitly and list 15 diverse and well-motivated examples. I explore some applications to model-building and agent foundations. Along the way, I characterise infrabayesianism uncertainty as the minimal way to encompass possibilistic uncertainty, probabilistic uncertainty, and reward.

No prerequisites are required beyond a high-school familiarity with sets, functions, real numbers, etc. Feedback welcome.

Introduction

Suppose I'm facing the following problem. There's an upcoming election between  candidates, and you're uncertain who will win. How can I model both your belief about the election and the election itself in a coherent way? By "belief" here, I mean your epistemic attitude, your internal model, your opinion, judgement, prediction, etc, etc. Think map-territory distinction: the election is the territory, your belief is the map, and I need to model both the map and the territory coherently despite the fact that the map and the territory are (typically speaking) two completely different types of thing.

Well, to model the election itself, I'll use a set  with an element for each electoral candidate. To represent your belief about the election, I must find another set  with an element for each belief that you might have about the election. I'll call  the state space and  the belief-state space. A solution to our problem is given by a mathematical operator  sending each state-space  to the matching belief-state space .

One may feel prompted to ask: does any operator  suffice here? Can the belief-state space be anything whatsoever, or must it carry some extra structure, possibly satisfying some additional constraints? Or, stated more philosophically, can any territory serve as a map for any other? I say no. Roughly speaking, the operator  must be a so-called monad, which will be the central object of this essay. But more on that later.

The first thing to note is that the appropriate operator  will depend on how exactly I wish to characterise a "belief" about the election, and there are multiple options here. For example, I might choose to characterise your belief by the set of candidates that you think have a possibility of winning. In this case, , denoting the set of non-empty subsets of . Alternatively, I might choose to characterise your belief by the likelihood that you give each candidate. In this case, , denoting the set of finite-support probability distributions over , i.e. functions  such that  is finite and .

In the first option, I'm characterising your belief-state by your possibilistic uncertainty, often encountered in doxastic or epistemic logic. In the second option, I'm characterising your belief-state by your probabilistic uncertainty, which is a finer-grained characterisation of belief because it differentiates between e.g. thinking a coin is fair and thinking a coin is slightly biased.

The second option has its merits. Indeed, many readers will instinctively reach for  as soon as they hear the word "uncertainty", and this instinct would serve them well. There's been a fruitful enterprise (in philosophy, mathematics, computer science, linguistics, etc) of replacing possibilistic uncertainty with probabilistic uncertainty in any model or concept where one finds it. But I want to note that both  and  would count as a solution to the problem. I'll return to these two examples throughout this essay because they are the flavours of uncertainty which will be most familiar to the reader.

Flavour of uncertaintyMonad
PossibilisticNonempty-powerset monad 
ProbabilisticDistribution monad 

As we will see, these two operators,  and , are both monads. The central claim of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. By "flavour of uncertainty" I mean a particular way of characterising someone's potentially uncertain belief about something. Possibilistic and probabilistic are paradigm cases, but in this essay we'll meet fifteen examples.

The forward-implication of this claim, that every flavour of uncertainty is a monad, is perhaps uncontroversial in some circles.[1] The backwards-implication, that every monad is a flavour of uncertainty, is worthy of more scepticism.

In this essay —

  • I will describe the correspondence explicitly.
  • I'll present a step-by-step method for formalising different flavours of uncertainty using monads.
  • I'll list fifteen examples of the correspondence, which I hope the reader finds well-motivated.
  • Finally, I'll discuss the relevance to agent foundations, with reference to infrabayesianism in particular.

Don't worry if you don't yet know what monads are. By the end of this essay you'll understand them as well as I do, which is enough to nod along when you hear "monad this" and "monad that".

 

The correspondence explicitly.

What's a flavour of uncertainty?

Recall from the introduction that I'm tasked with representing or modelling both the election itself and your belief about the election. The first step of this task is to settle on a particular flavour of uncertainty to characterise the belief-states — possibilistic, probabilistic, infrabayesian, etc. One might ask, of this flavour of uncertainty, the following four questions —

  1. Count?
    What's counts as a distinct belief about the election? Concretely, if there are  electoral candidates then how many distinct belief-states are there?
  2. Certainty?
    If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine your belief-state?
  3. Collapse?
    Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?
  4. Combine?
    Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?

These four questions — Count? Certainty? Collapse? Combine? — are essentially epistemological questions, and they collectively pin down what I mean by a flavour of uncertainty.[2] As we will see, a monad corresponds to answers to the first three questions and a commutative monad corresponds to answers to all four questions.

Exercise 1: How would you answer these questions for possibilistic uncertainty? Or for probabilistic uncertainty?

Exercise 2: As I mentioned before, an answer to Count? is a set  for each set . What about for Certainty? Collapse? and Combine?

 

What's a (commutative) monad?

Monads were born of category theory — a field of mathematics which many regard as arcane, mystical, or downright kabbalistic — but monads can (I think) be understood by someone lacking any acquaintance with category theory whatsoever. Indeed, my claim in this essay is that monads correspond exactly to Map-Territory-like relations, and such relations will be familiar to anyone who's both got a brain and pondered this predicament.

I'll first write down the mathematical definition of a monad, and then I'll explain how this definition mirrors the four epistemological questions.

Definition: A monad  consists of three operators[3]:

  • The construct operator  which assigns a set  to each set .
  • The return operator  which assigns a function  to each set .
  • The bind operator  which assigns a function  to each pair of sets .

Moreover, a commutative monad  is a monad  equipped with a fourth operator:

  • The product operator  which assigns a function  to each pair of sets .

These operators must also satisfy some basic algebraic laws to qualify as a (commutative) monad. See here for details.

Notation: I'll use variables  for elements of , and boldface variables  for elements of . I may talk loosely of the monad  rather than  or of the commutative monad  rather than . I may write , or  for clarification. I may write  instead of , and  instead of .

 

How do they correspond to each other?

In short, there is an exact correspondence between the operators of a (commutative) monad and the four epistemological questions. Let's go one-by-one.

1. Count?
What's counts as a distinct belief about the election? Concretely, if there are  electoral candidates then how many distinct belief-states are there?

An answer to this question is the constructor operator, assigning a set  to each set . If  is the set of potential outcomes of an event then  is the set of beliefs about the event.

As we discussed before, for possibilistic uncertainty , and for probabilistic uncertainty .

2. Certainty?
If you're certain that a particular candidate will win the election (and I know which candidate) then how should I determine your belief?

Here, an answer will be the return operator assigning a function  to each set . If you're certain that a state  will occur, then  is your belief-state.

For possibilistic uncertainty, , the singleton set containing . And for probabilistic uncertainty, , the dirac distribution at  given by .

The function  describes how the state-space embeds in the belief-state-space. This is related, I think, to the idea that each territory can serve as its own map. (See Borges' On Exactitude in Science for an exploration of this theme.) Or in the words of Norbert Wiener, “The best model of a cat is another, or preferably the same, cat.”

3. Collapse?
Suppose a number of forecasters are speculating on the election. If I'm given the belief of each forecaster about the election, and I'm given your belief about the forecasters' beliefs, then how should I determine your belief about the election itself?

Here, an answer will be the bind operator assigning a function  to each pair of sets  and . You should think of the bind operator as collapsing your second-order beliefs to your first-order beliefs — i.e. if each forecaster  has an first-order belief , and  is your second-order belief about which forecaster is correct, then  should be your first-order belief about the election.

For possibilistic uncertainty,  is the union . And for probabilistic uncertainty,  is the summation/integral .

This is related to the idea that a map of a map of a territory is a map of that same territory; a depiction of a depiction of person is a depiction of that same person, a representation of a representation of an idea is a representation of that same idea; etc.

One might think of  as some parameterisation of the belief-state  using some parameters . Then the bind operator gives us the function for finding your -belief from you -belief. Explicitly, this function is.

Moreover, the bind operator doesn't just flatten one level of "meta". Often we have an entire hierarchy of state-spaces  where beliefs about  are parameterised by some "higher" state-space  via a function . Here, the state-space  is the object-level system, the state-space  parametrises your first-order beliefs about , the state-space  parameterises your second-order beliefs about , and so on. Then the bind operator says that I can collapse your th-order beliefs all the way to your first-order beliefs via the function .[4]

4. Combine?
Suppose there are two completely unrelated elections happening somewhere. If I'm given your belief about the first election, and your belief about the second election, then how should I determine your belief about the pair of elections?

An answer will be the product operator  assigning a function  to each pair of sets  and . If  is your belief about the first election and  is your belief about an unrelated second election, then  is your belief about the pair of elections.

For possibilistic uncertainty,  is the cartesian product . And for probabilistic uncertainty,  is the joint distribution .

Thinking of  as a factorisation of the state-space , the product operator implies that your beliefs about each  combine to yield your overall belief about . That is, a commutative monad  corresponds to a flavour of uncertainty that you can have to parts of the world, whereas a non-commutative monad  corresponds to a flavour of uncertainty that you can only have to the world in its entirety.

Historical note: The central thesis of this essay is that there is a uniform correspondence between flavours of uncertainty and monads. I call this Myers' correspondence after David Jaz Myers, because I first encountered the idea in his book Categorical Systems Theory, where he devotes a chapter to using commutative monads to model various nondeterminism of automata. Nonetheless, he idea did not originate with him, he's never claimed it is true, and I don't know if he agrees with it.

 

Examples of Myers' correspondence

The correspondence between he operators of the (commutative) monad and the epistemological questions also serves as a practical recipe for formalising different flavours of uncertainty using monads. I've personally found it useful. First, think about the particular flavour of uncertainty, then answer the Four C's (Count? Certainty? Collapse? Combine?), convert those answers into mathematical operators, and voilà you've got yourself a monad.

I'll now zoom through fifteen examples, beginning (without commentary) with the paradigm examples of  and .

1 - nonempty powerset monad

Flavour of uncertaintyPossibilistic
MonadNonempty powerset
Construct 
Return 
Bind 
Product 
Interpretation if you consider the outcome  to be possible.

2 - distribution monad

Flavour of uncertaintyProbabilistic
MonadDistribution
Construct 
Return  
Bind 
Product 
Interpretation is your subjective credence in the outcome .

3 — reader monad from 

Okay, now let's deal with a flavour of uncertainty which is sometimes called "indeterminacy". An indeterminate belief is something like "Well, if  is true then , but if  is true then , but–", i.e. it's a belief which is uncertain because your best guess depends on some unknown variable. More formally, your belief-state is given by a particular function from  (the possible values of the unknown variable) to  (the state-space).

This is an ordinary usage of the word "uncertain" so, by Myers' correspondence, it must correspond to a monad, and we can discover which monad by answering the four Cs. If  is the state-space then the belief-state-space is given by , the set of functions . So our construct operator is . If you're certain tha tthe outcome is  then your belief-state is the constant function . The intuitive answers to Collapse? and Combine? give us our bind and product operators.

Overall, we get what's called the reader monad from .

Flavour of uncertainty-indeterminacy 
MonadReader monad from 
Construct 
Return 
Bind 
Product 
Interpretation if  is your best guess about the outcome conditioned on the information .

4 — writer monad to 

Often, people will report their uncertain beliefs like "The coin will land heads (98%)" or "AI will disempower humanity (60%)". That is, their belief is a best guess paired with their confidence, which they offer as a lower-bound on the likelihood of that their guess is correct. A certain belief-state would be something like "The coin will land heads (100%)".

What monad corresponds to this flavour of uncertainty?

If  is the state-space then  is the belief-state-space, i.e. there's a distinct belief-state for each pair . If you're certain that the outcome is  then your belief-state is . Uncertainty is collapsed by multiplying the confidences. Uncertainty is combined also by multiplying the confidences.

Ta-da! The writer to  monad..

Flavour of uncertaintyConfidence-marked guess
MonadWriter to  monad
Construct 
Return 
Bind  where  and
Product  where  and .
Interpretation if  is your confidence in the outcome, i.e. you think that the likelihood of  is at least .

Using the writer to  monad, we've characterised a belief-state as an outcome marked with some additional metadata, namely a confidence . What properties of the interval  did we appeal to in this definition? Well, firstly that we can multiply different elements (see bind and product operators). And secondly, that there's a fixed element such that multiplying with this element does nothing (see return operator).

Hence we can generalise: given any monoid  we have a monad  called the writer-to- monad.[5] By using different monoids, we can model different flavours of uncertainty, but note that this is only a commutative monad when  is a commutative monoid.

There's another ordinary usage of the word "uncertainty" where an uncertain belief would be something like "AGI arrives before 2040 unless there's a nuclear war" and a certain belief would be something like "AI will arrive before 2040." At least, with regards to teh binary question of whether AGI arrives before 2040. That is, an uncertain belief is one with an "unless..." clause.

Formalising this, we have a fixed set of events , and a belief-state is a pair . Your belief-state is  when you commit to the state  occurring unless the event  occurs. This flavour of uncertainty corresponds to the writer monad , where  is a monoid when equipped with union  and the empty set .

One might use this flavour of uncertainty to models various kinds of defeasible reasoning, where a belief-state  is characterised by the precondition  under which the belief would be defeated or disavowed.

Flavour of uncertaintyUnless-claused guess
MonadWriter monad to 
Construct 
Return 
Bind  where  and
Product  where  and .
Interpretation if you think  will occur unless event  occurs.

Or maybe an uncertain belief is a one full of amendments, clarifications, conditions, disclaimers, excuses, hedges, limitations, qualification, refinements, reservations, restrictions, stipulations, temperings, etc. By contrast, a certain belief is made "with no ifs or buts", bare and direct.

Formalising this, we have a fixed set of clarifications , and a belief-state is a pair . Here,  is the free monoid over the set of clarifications  equipped with concatenation  and the empty list .

Flavour of uncertaintyClarified guess
MonadWriter to  monad
Construct 
Return 
Bind  where  and
Product N/A (See below.)
Interpretation if you think  will occur and  is a list of your clarifications.

Now, the writer to  monad isn't a commutative monad. Or interpreted philosophically, a clarified guess isn't the kind of uncertainty you can have to parts of the world. Suppose "I think Alice is happy but I don't know her very well" is my belief-state about Alice, and "I think Bob is happy but he's difficult to read" is my belief-state about Bob. What's my belief-state about both Alice and Bob? Is it (1) "Alice and Bob are both happy, but I don't know Alice very well and Bob is difficult to read" or (2) "Alice and Bob are both happy, but Bob is difficult to read and I don't know Alice very well". That is, in which order should we combine the clarifications?

The instinctive trick is to declare that two belief-states are equal if the lists of clarifications are equal up-to-permutation — this implies that (1) and (2) are the same belief-state, which does seem intuitive to me. If we play this trick, then the resulting flavour of uncertainty is captured by the writer-to- monad, where  is the free commutative monoid. This does indeed give a commutative monad!

Flavour of uncertaintyUnordered clarified guess 
MonadWriter monad to 
Construct 
Return 
Bind  where  and
Product  where  and .
Interpretation if you think  will occur and  is an unordered list of your clarifications.

5 — identity monad

If we've anticipating an election between  candidates, then the simplest way to characterise your belief about the election by your best guess with no additional information about how unsure you are. If  is the state-space then  is also the belief-state-space, i.e. there's a distinct belief-state for each . The set of belief-states is therefore equal (up to bijection) to the set of outcomes itself.

I'll admit that this flavour of uncertainty is somewhat degenerate — e.g. every belief-state is a certainty in some particular state — but it's worth including nonetheless. On some readings of Wittgenstein's Tractatus, this is his model of how language represents the world, our utterances stand in direct isomorphism with the state-of-affairs.

Anyway, answering the four Cs would give the identity monad

Flavour of uncertaintyBest guess
Monadidentity monad
Construct 
Return 
Bind 
Product 
Interpretation if  is your best guess about the outcome

6 — maybe monad

The last example was a bit silly, so how about this instead..?

If we've anticipating an election between  candidates, then I'll characterise your belief about the election either by your best guess (with no additional information) or an "I don't know" response. This is an very coarse-grained flavour of uncertainty — the only belief-state about the election (other than certainty in a particular candidate) is the belief-state of utter cluelessness, or shrugging one's shoulders!

Despite the coarse-grained-ness, it's pretty commonly encountered in the wild. For example, it's the typical flavour of uncertainty encountered in surveys/questionnaires, where  is read as "no opinion/don't know". It's also encountered in voting, where  is read as "abstention". 

Formally speaking, if  is the state-space then there's a distinct belief-state for each state  plus an additional option denoted . The belief-state-space is therefore , denoting the disjoint union of  with the singleton set . If you're certain that the outcome is  then your belief-state is . This flavour of uncertainty corresponds to the famous maybe monad.

Flavour of uncertaintyguess-or-shrug
Monadmaybe monad
Construct 
Return 
Bind 
Product 
Interpretation if  is your best guess for the outcome, and  if you offer no best guess.

7 — -distribution monad

You might, at this point, feel short-changed. I've discussed so far a range of flavours of uncertainty which are all coarser-grained than probabilistic knowledge, so why not stick to ? Let's consider then a more fined-grained characterisation of belief-state, one that tracks infinitesimal differences between probability assignments.  

The Levi-Civita Field is an extension of the real numbers which contains infinitesimal values like  and infinite values like . We can replace  in the definition of  with  to obtain a monad  corresponding this flavour of uncertainty. On this account, a belief-state  is something which tracks the potentially infinitesimal likelihood  of each outcome . This flavour of uncertainty has applications in infinite ethics and cooperation in large worlds.

For example, in a universe with infinite radius , what's your prior likelihood that you occupy the most central galaxy? Presumably, the likelihood should be , where  is the density of galaxies.

Now suppose you were offered a lottery which promises to benefit everyone by  if you indeed occupy the most central galaxy but otherwise benefits no one. What's this lottery worth? Presumably, it's worth , because the infinitary stakes  are cancelled out by the infinitesimal chance of winning .

Note that because  is totally-ordered, once we assign  values to different lotties, we can perform expected utility maximisation as usual, and get sensible results. I think that infinitesimal probabilities resolves some (but not all) problems in infinite ethics. I'm particularly lured by the hope that, in an infinite cosmos, the infinitary stakes might somehow cancel out with infinitesimal probabilities to yield finite values. See Joe Carlsmith's essay On Infinite Ethics for further discussion.

Flavour of uncertaintyinfinitesimal probabilistic
Monad-distribution monad
Construct 
Return  
Bind 
Product 
Interpretation is your potentially infinitesimal subjective credence in the outcome 

How far can one generalise the kind of entity that a "probability" must be, before our definition breaks? Well, so long as we have some rig , we can define a monad  by replacing  with . A rig is a set  equipped with a zero element , a unit element , an addition function , and a multiplication function , satisfying certain algebraic laws. By choosing different rigs  then we  obtain different monads  corresponding to different flavours of uncertainty.

When  we obtain the ordinary probability distributions, and when  we obtain the rational probability distributions, etc. Toby Fritz suggests that by using similar tricks we might obtain quantum uncertainty, fuzzy uncertainty, and Dempster–Shafer uncertainty, but I haven't checked whether this is true.

Flavour of uncertainty-probabilistic
Monad-distribution monad
Construct 
Return  
Bind 
Product 
Interpretation is your subjective credence in the outcome , where  is whatever rig of exotic probabilities

8 — quantum monad

For sure, quantum mechanics is endowed with its own flavour of uncertainty, hence the term Heisenberg's Uncertainty Principle. It's not impossible to catch a physicist saying "it's uncertain whether the qubit is 0 or 1" or "it's uncertain whether the cat is alive or dead", regardless of whether they consider quantum uncertainty as strictly speaking epistemic. By Myers' correspondence, this flavour of uncertainty must correspond to a monad.

Exercise 3: Which?[6]

9 — smooth state monad

The position of the North Star in the night sky is constant, static, immutable, certain; the position of Mercury, by contrast, is variable, dynamic, mutable, uncertain. Is this not a common sense of the word? Might one not say that my belief-state about Mercury's position will forever be uncertain, no matter how accurate my telescope or exhaustive my calculations, because my belief is always revised? If so, then by Myers' correspondence this flavour of uncertainty corresponds to a monad.

To formalise this, let's fix a differentiable manifold  parameterising your internal mental state as you think about a question. Note that because  is a differentiable manifold, it's equipped with tangent space  at every .

If  is the state-space, then  is your belief-state-space. In other words, we have a distinct belief-state for each smooth transition function . A belief-state  is characterised by a pair  for each , where  is your current guess and  is the tangent vector describing how your mental state is evolving. If you're certain that the winner is  then your belief-state is the static transition function  where  is the zero vector.

This is the smooth state monad — it's a differentiable version of the discrete-time state monad, with the additional benefit that it's commutative monad.

Flavour of uncertaintyevolving guess
Monadsmooth state monad
Construct 
Return 
Bind  where  and 
Product  where  and .
InterpretationThe transition function  describes how your internal mental state evolves over time and produces guesses.

10 — continuation monad

What are belief-states actually for anyway? What purpose do they play in rational decision-making? According to one school of thought, belief-states are simply gadgets for taking expected values, and chiefly for taking expected utility values.

Let's say  is the set of candidates running in the election, and  is your utility function, i.e.  measures how happy you'd be to hear that the candidate  has won. Then your ex-ante utility is some  measuring how happy you are now in anticipation of the outcome. Given your belief-state, I should be able to determine  from , which implies that I can just characterise your belief-state about the election by how  is determined from . Neat.

This is formalised by the so-called continuation to  monad. If  is the state-space then  is the belief-state-space, where  is the set of functionals . And a belief-state  is certain in the outcome  if  determines your ex-ante utility simply by evaluating your utility function at , i.e. .

The continuation monad encompasses both possibilistic uncertainty and probabilistic uncertainty. If the nonempty subset  models your possibilistic uncertainty then the associated functional  is given by . If the distribution  models your probabilistic uncertainty then the associated functional  is given by .

Flavour of uncertaintyex-ante utility
MonadContinuation monad
Construct 
Return 
Bind 
Product Unfortunately,  is not a commutative monad.[7]
InterpretationIf  assigns your ex-post utility  to each outcome , then  is your ex-ante utility.

Exercise 4: (Beginner) Prove that the two maps  and  are injections. (Advanced) Prove these injections are monad transformers.[8]

11 — signature monad 

Maybe I should characterise your belief-state about something by the sentence that you'd utter about the outcome. This will result in a more syntactic or linguistic account of belief. You might imagine here a shared language, like English or Python, with which a speaker may report their beliefs to a friend. Or you might imagine a private mental language in which a brain/AI will store their knowledge about the world.

To make this rigorous, I must introduce a language containing all the sentences that you might utter about the outcome. Our language will include an atomic sentence  for every outcome , along with certain connectives for combining sentences. For example, suppose we have a language with two symbols, a binary connective  called disjunction and a unary connective  called negation. If  are the candidates in an election, then a belief-state about the electoral outcome is a sentence like  or .

The logical connectives can be specified by a signature. A signature is a set  equipped with a map  sending each connective to its arity. So the aforementioned language has the signature  with  and 

We denote the resulting set of sentences by . This is a set containing all the sentences freely generated from  using the connectives in . Explicitly,  is the smallest set such that  for every  and  for every , and .

With this machinery in place, we can answer the Four C's, and thereby find the corresponding monad.

  1. If  is the state-space then there's a distinct belief-state for each sentence .
  2. If you're certain that the winner of the election is , then your belief-state is the sentence .
  3. Let  be the function assigning to each forecaster  their belief-state  about the election. And let  be your belief-state about the forecasters. Then your belief  about the election itself is given by uniform substitution: loop through the sentence  and, every time you come across an atomic letter , replace it with the sentence . This results in a sentence .[9]
  4. Unfortunately,  isn't generally a commutative monad.[10]
Flavour of uncertaintyutterance in a language
Monadsignature monad 
Construct 
Return 
Bind Uniform substitution of every with    in the sentence 
Product N/A
Interpretation is the sentence that you would utter about the outcome, in a language which contains an atomic letter for each outcome  and a logical connective for each .

Many monads are equivalent to  for some signature , including many monads we've already encountered.

  • When , then  is equivalent to the identity monad. This is intuitive. If there's no connectives in the language, then every utterance is a single atomic sentence positing one of the outcomes.
  • When  consists of one constant symbol (i.e. zero-arity connective) then  will contain the atomic sentences  plus one additional sentence . So  is equivalent to the maybe monad. We encountered this before as modelling the guess-or-shrug flavour of uncertainty.
  • When  consists of many constant symbols, then  will contain atomic sentences  plus additional sentences  for every . So  is equivalent to what's called the exception monad . This is like the guess-or-shrug, except there are multiple ways to shrug one's shoulders.
  • When  consists of one unary connective, then  will contain sentences like . So  is equivalent to the writer monad to the monoid . If  consists of many unary connectives, then  is equivalent to the writer monad to . We encountered this before as modelling the clarified guess.
  • When  consists of one a binary connective, then  will consist of sentences like  . So  is equivalent to the set of full binary trees over . As Vanessa Kosoy notes, "we think of such a tree as a way to select an element of  by reading a stream of bits." (See here.)

    Isn't the archetypal symbol of uncertainty... a fork in the road? Imagine a traveller facing two paths, left and right, each forking further ahead, and so on unboundedly, forming a fractal canopy of binary choices.

12 — algebraic theory

There's something a bit perverse about characterising your belief-state with a single utterance about the outcome. Namely, some utterances will be logically equivalent to each other, such as  and , and therefore the belief-state in which you're willing to utter  is the exact same as the belief-state in which you're willing to utter , assuming that you're both rational and honest. Therefore, our previous characterisation was overcounting the belief-states by distinguishing logically-equivalent sentences. Bizarrely, there would be infinitely-many belief-states about a single coin flip — i.e. , and so on.

To fix this, what we need isn't just a signature , but rather a signature  paired with a set  of equational axioms, which is called an algebraic theory. An equational axiom is a pair of sentences built using the connectives in  and some placeholder sentence variables . We use  to define an equivalence relation  on  by taking the deductive closure of the axioms, and then the equivalence classes of the sentences will be our belief-states.

For example, if our signature is  and we intend to interpret the  connective as disjunction, then  should consist of three axioms:

  1. Idempotency, 
  2. Commutativity, 
  3. Associativity, 

Furnished with the concept of an algebraic theory, we can now improve our answers:

  1. If  is the state-space then there is a distinct belief-state for each equivalence class of sentences . This set is denoted .
  2. If you're certain that the winner is , then your belief-state is the sentence .
  3. Let  be the function assigning to each forecaster  their belief-state  about the election. And let  be your belief-state about the forecasters. Then your belief  about the election itself is given by uniform substitution modulo equivalence. We know  for some  and that  for some . Then  where  is the bind operator for the signature monad. This is operation is well-defined because the deductive system satisfies referential transparency — i.e. if  then .
  4. Again,  isn't generally a commutative monad.
Flavour of uncertainty equivalence class of utterances
Monadutterances-modulo-equivalence
Construct 
Return 
Bind  where  and 
Product N/A
Interpretation is the set of sentence that you would assert about the outcome, in a language which contains an atomic letter for each outcome , a logical connective for each , and where  is the set of equational axioms governing the connectives of .

If a monad  is equivalent to  for some algebraic theory  then we call  a presentation of the monad.[11] A presentation of a monad is a rather nice description of a flavour of uncertainty via some operators for defining belief-states in terms of other belief-states and some rules governing those operators.

  • When  is empty, then  is obviously just the signature monad .
  • When  contains a unary connective for every  and  contains the axioms  , then  is equivalent to the writer monad to the monoid . We encountered this before as the confidence-marked guess. In general, we can give a similar presentation for the writer monad to any monoid . So the unless-claused guess has a similar presentation.
  • When  and  is idempotency, commutativity, and associativity (shown above), then there is a distinct class  for each non-empty finite subset of . So  is equivalent to the nonempty finite powerset monad . This is a finitary version of the monad  which we've encountered as modelling possibilistic uncertainty. This algebraic theory is also called the theory of semilattices.
  • Let's find a presentation for  the distribution monad. The signature  will contain a binary connective for every . Our axioms will be  (skew-idempotency),  (skew-commutativity), and  for (skew-associativity). You should think of  as  units of  and  units of , which explains the ghastly expression for skew-associativity. This algebraic theory is called the theory of convex algebras.

Exercise 5: Find a presentation for  for an arbitrary rig .

13 — convex powerset of distributions monad

As we saw before, the continuation monad  encompasses both possibilistic and probabilistic uncertainty. Unfortunately  lacks any presentation, even if we allow connectives with infinite arity![12] Fortunately, there exists a monad encompassing both possibilistic and probabilistic uncertainty which is presentable.

Recall that the nonempty finite powerset monad , which corresponds to possibilistic uncertainty, is presented by the theory of semilattices . And the distribution monad , which corresponds to probabilistic uncertainty, is presented by the theory of convex algebras . Consider the theory  where  is an additional axiom of describing how the  connectives distribute over the  connective.

This new theory is a presentation the convex powerset of distributions monad. This monad, denoted by , corresponds to a flavour of uncertainty wherein a belief-state is a convex set of distributions, e.g. "The coin lands either heads (20-30%) or tails (70-80%)." (See credal sets.)

Now, we could have defined  in an entirely non-syntactic way, i.e. " is the set of nonempty finitely-generated convex-closed sets of finite-support distributions over ." But I think the syntactic definition, in terms of the algebraic theories for  and , elucidates why  is a well-motivated unification of probabilistic and possibilistic uncertainty. We will employ a similar strategy for motivating infrabayesianism — roughly speaking, infrabayesianism is exactly what you get when you combine probabilistic and possibilistic uncertainty with reward.

Flavour of uncertainty imprecise probability
Monad convex powerset of distributions monad
Signature 
Axioms 

 is semilattice,
i.e. 

 is convex algebra,
i.e.  

 distributes over ,
i.e.   

Interpretation

 is certainty in an outcome .

 is possibilistic uncertainty between  and .

 is probabilistic uncertainty between  (with chance ) and   (with chance ).

14 — free convex lattice monad

There's a common usage of the word "uncertainty", where the uncertainty is modulo strategic choice. For example, you might hear "Black is certain to win" from a chess commentator if Black can force a checkmate, or hear "the winner is still uncertain" from a poker commentator during the flop. By Myers' correspondence, this flavour of uncertainty — call it "ludic uncertainty" — must correspond to some monad, but which?

Consider the theory of convex lattices — with signature  and the following axioms:

  •  is a lattice.[13]
  •  is a convex algebra.
  •  distributes over  and , i.e.  and .

Then  is a monad corresponding, I think, to the aforementioned flavour of uncertainty. It sends a set  to the set , the free convex lattices over . An element of  should be read as a game-tree whose non-leaf nodes are either a free binary choice by White, a free binary choice by Black, or a biased coin flip. The leaf nodes may be either wins for White, wins for Black, or an element of the set .

We treat game-trees  as equivalent if the same outcome would result from  and  regardless of the player's preferences over the elements of . For example, the lattice axioms  and  will hold because no player would willingly choose to loose, and the axioms  and  establish that the players are adversarial, i.e. would never willingly empower one another.

Exercise 7: Consider the game  shown below. Which outcome is (ludically) certain?

The outcome of the game  is certain.

Note that  aren't really games in the usual sense, because leaf nodes might be elements of , and we treat these elements are pairwise incomparable to both players. So you should think of  as a set of partially-specified game trees. A fully-specified game tree would be an element of , which is a game tree where each leaf-node returns some -valued utility to Black and disutility to White. You may notice that  can itself be equipped with the structure of a convex lattice, which just means there exists a -algebra .[14] This -algebra is exactly the well-known used in combinatorial game theory.

Flavour of uncertainty ludic
Monad free convex lattices
Signature 
Axioms 

 is a lattice.

 is convex algebra.

 distributes over both  and ,
i.e. 

Interpretation

 is a game which will certainly result in outcome .

 is a game where White wins and  is a game where Black wins.

 is a game where White can choose to play  or to play .

 is a game where Black can choose to play  or to play .

 is a game where  is played with chance  and  with chance .

15 — infrabayesianism

When agents have beliefs about the same environment that they're embedded in, weird things can happen. Over the past few years, Vanessa Kosoy and Alex Appell have been exploring a novel flavour of uncertainty — infrabayesian uncertaintywhich they claim more fruitfully characterises the belief-states of embedded agents. In particular, it characterises belief-states concerning Newcomb-like environments, where the state of the environment is correlated with the agent's choice under consideration. Their flavour of uncertainty corresponds to the infrabayesian monad .

Roughly speaking,  is the same as  above except without the  connective. Consider  the theory of convex semilattices with top and bottom, which is a presentation of the composite monad .[15] From what I understand, this monad  is Kosoy's infrabayesian monad .[16] This justifies the claim that infrabayesianism is the flavour of uncertainty that minimally encompasses both possibilistic uncertainty (via the  monad), probabilistic uncertainty (via the  monad), and reward (via the  monad). I think that this motivates infrabayesianism as a characterisation of an agent's belief-state about their environment.

Flavour of uncertaintyinfrabayesian
Monad infrabayesian monad
Signature 
Axioms 

 is a semilattice with  and .

 is convex algebra.

 distributes over ,
i.e. 

Interpretation

 is an environment which certainly results in outcome .

 is an impossible/contradictory environment where the agent achieves no disutility, called Nirvana.

 is an environment where the agent suffers maximal disutility.

 is a environment which is either like  or like , and our agent should be pessimistic here.

 is an environment which is like  with chance  and  with chance .

Unfortunately,  isn't a commutative monad, which means it's not a flavour of uncertainty that you can have to parts of the world, but only to the world in its entirety. Put starkly, there's no way to combine my infrabayesian belief-states about two coin toss to yield a single infrabayesian belief-state about the pair of coin tosses, even when the coin tosses are completely unrelated.[17] This, I think, limits both the theoretical appeal of infrabayesianism and its tractability.

Theoretically speaking, the fact that  isn't a commutative monad weakens the analogy between infrabayesian uncertainty and possibilistic or probabilistic uncertainty. Many concepts are built upon possibilistic or probabilistic uncertainty which appeal, in an essential way, to the product operators  or . And infrabayesianism, lacking such an operator, is not guaranteed the analogous concept.

Practically speaking, the lack of an infrabayesian product operator is an obstacle to parallelising algorithms which assume infrabayesian belief-states. There is no way to decompose the environment into separate components, discover an infrabayesian belief-state for each component, and then combine those belief-states into a single belief-state about the environment as a whole.

Implications for AI safety 

Does this essay have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this essay is maybe agent foundations foundations or something like that. But I feel compelled to offer some practical implications for AI safety to validate my decision to write this essay and your decision to read it.

  1. One lesson is that uncertainty comes in many flavours, and formalisating different flavours of uncertainty isn't mathematically challenging. Just ask yourself the Four C's (Count? Certainty? Collapse? Combine?) and you've got yourself a monad.
  2. Often, you can replace one monad in a formalism with another and everything will still type-check. For example, the stochastic Markov decision processes are transition functions . One can generalise this to   for any monad  we've met so far.
  3. If you're conducting active research into agent foundations, then instead of assuming a fixed flavour of uncertainty (e.g. possibilistic, probabilistic, infrabayesian, etc), perhaps see if you can generalise the theory to an arbitrary monad, or at least an arbitrary commutative monad. I call such theories "parametric in the monad". If you're gonna do foundational work, it often pays to make it highly parametric, even if you only care about a specific case.
    1. The theory will be robust to errors about the appropriate flavour of uncertainty.
    2. If you want to account for another flavour of uncertainty, you'll have saved yourself time, effort, and ink.
    3. You've got more data points to sanity-check the theory — do you get sensible answers when you plug in different monads, e.g.  etc?
  4. If your solution to AI safety involves, at some step, building a formal model of the environment (c.f. Davidad's Open Agency Architecture.) or of a human (c.f. imitative amplification), then this model should carry all the flavours of uncertainty that actually characterise your belief-state about the system. And you shouldn't feel compelled to shoe-horn all your uncertainties into a probability distribution. For example, unless-claused uncertainty seems pretty fundamental — we commit to our stochastic models of the environment and/or a human only within a narrow range of situations — and this flavour of uncertainty seems irreducible to probabilistic uncertainty.

Further questions

In so far as "flavours of uncertainty" is an informal term, there's little we can do to test the correspondence other than enumerating well-known flavours of uncertainty and checking that they do in fact correspond to monads, and vice-versa, enumerating the well-known monads and giving them natural doxastic interpretations. I think my own attempt has been positive, but this result is open to revision.

Secondly, the the biggest asterisks of my essay: my treatment of belief-states has been silent on their most important property, namely that they are learned. For example, a probability distribution can be conditioned on new evidence, and possibilistic uncertainty also carries an analogous notion of conditioning. Perhaps any characterisation of belief should answer additional questions about how those belief-state revised in light of new evidence/observations/considerations, etc. Perhaps we should append to Count? Certainty? Collapse? Combine? a fifth question, Condition? I'm sympathetic to this worry.

And if indeed learning is a phenomenon which must be modelled by any characterisation of belief, then monads do not themselves carry enough structure to characterise beliefs. Rather, we would need to equip the monad  with some additional structure, perhaps a family of maps  for some spce of observations , possibly satisfying some additional constraints such as  and . I'm just improvising here.

This is best left to future work, if the need arises.


Flavour of uncertaintyMonad
possibilisticnonempty powerset monad
probabilisticdistribution monad
indeterminatereader from  monad 
confidence-markedwriter to  monad
unless-clausedwriter to  monad
ordered clarificationswriter to  monad
unordered clarificationswriter to  monad
best guessidentity monad
guess-or-shrugmaybe monad
infinitesimal probabilistic-distribution monad
generalised probabilistic-distribution monad
quantumquantum monad
evolvingsmooth state monad
ex-ante utilitycontinuation to  monad
utterance in a languageguess-or-many-shrugs
guess-or-different-shrugsexception monad
path through forking roadfull binary trees monad
utterance modulo equivalencealgebraic theory
imprecise probabilityconvex powerset of distributions monad
ludicfree convex lattice monad
infrabayesianinfrabayesian monad

  1. ^

    In particular, I'm thinking of the applied category theory community.

  2. ^

    Traditionally, the field of analytic epistemology has been concerned with defining epistemological concepts — i.e. constructing definitions for the concepts of knowledge, belief, evidence, learning, testimony, justification, etc. However, in recent years analytic epistemology has reorientated itself, chiefly under the influence of Timothy Williamson, towards modelling epistemological phenomena — i.e. constructing mathematical models for phenomena relating knowledge, belief, evidence, learning, testimony, justification, etc. This reorientation in epistemology, from concept-defining to model-building, was inspired by the natural sciences.

  3. ^

    An operator  assigns, to every set , another set/function .

    For example,  is the powerset operator, which assigns to every set  another set . You can informally think of an operator as a function — but strictly speaking, an operator can't be a function because its domain would be the "set of all sets" (which doesn't exist).

    Formally, the domain of an operator is something called a category. Categories can be larger than sets — in particular there is a category containing all the sets and the functions between them. For pedagogical purposes, I've framed everything in this article in terms of sets and functions, but most of the content of this article can applied to any category with enough structure.

  4. ^

    And I suppose, by "generalising backwards", that my zeroth-order belief about the coin toss is the actual result of the coin toss..?

  5. ^

     is a monoid if  and .

    A monoid is like a group except the elements might not have inverses, e.g.  is a group but  is only a monoid.

     is a commutative monoid if also .

    The writer monad for  is given by the data ,, and  where  and .

  6. ^

    Solution: I think  is the -dimensional hilbert space, but this isn't my expertise.

  7. ^

    Suppose  has two distinct elements  and . Let  and . Then there are two ways to combine  and  into a single belief in , i.e.  and . But these differ so  is not a commutative monad for .

  8. ^

    In fact,  encompasses every other monad  such that  is a -algebra. This explains why  encompasses both possibilistic and probabilistic uncertainty — specifically, it's because  is a -algebra and  is a -algebra.

    Moreover,  is the smallest monad with this property, because there's a bijection between -algebras  and monad morphisms . See here for details.

    That being said,  isn't the smallest monad encompassing both  and  in particular. If you only need to encompass  and  then Vanessa Kosoy's infrabayesian monad  will suffice, but  is strictly contained within .

  9. ^

    For example, suppose  and  satisfies  and . Then we find  via uniform substitution.

    In pythonese, S_string = ''.join(t if t in Sigma else f(t) for t in W_string)

    Equivalently, we can define the bind operator recursively on the depth of . For atomic sentences, , and for compound sentences, .

  10. ^

    In particular, suppose  contains two unary connectives. Suppose  is my belief-state about  and  is my belief-state about . Then there are two ways to combine these two beliefs into a single belief in , i.e.  and . But these differ so  is not a commutative monad.

  11. ^

    Note that a monad might have many distinct presentations, and this non-uniqueness is rather distasteful. The more elegant treatment of monads is with Lawvere theories, where both atomic connectives and compound connectives are treated on par.

  12. ^

    For any cardinality , we say that a monad has rank  if it has a presentation with operations of arity at most . The continuation monad has no rank (not even an infinitary one) which is a somewhat perverse property for a monad. A rankless monad isn't generated by any algebraic theory, even if we allow infinitary operators.

    We can see that  is rankless monad because it contains  as a submonad for every , but  is a monad without rank.

  13. ^

    The lattice axioms for the signature  consists of the semilattice axioms for , the semilattice axioms for , the boundary axioms  and , and the absorption laws  and .

  14. ^

    The position evaluation function  is defined inductively:





  15. ^

    That is, the signature  consists of the connectives , and  contains the axioms: .

    Strictly speaking, it's improper to speak of composing monads  and  unless you provide a distributive law of  over , i.e. . But  yields a monad  given by the convex powerset of distributions monad, and the exception monad  distributes over any monad, so no worries here.

  16. ^

    A technical caveat:

    Kosoy's infrabayesian monad  is actually given by  rather than  — that is,  contains sets of distributions with arbitrary cardinality. A least, this is my reading from Diffractor's Infra-Miscellanea Section 2.

    Unfortunately,  is a rankless monad, i.e. it isn't generated by any algebraic theory even if we allow infinitary operators.

    Fortunately, we may approximate  with a monad of rank  for any cardinality . Let's define , where  is the set of non-empty subsets of  of cardinality no greater than . Algebraically, we obtain  by adding the -ary disjunction connective  to the signature for .

    This leaves the open question, for which cardinality  is  an adequate and tractable approximation, if indeed any? I suspect  suffices for all theoretical purposes, and that  suffices for all practical purposes.

  17. ^

    This also applies to imprecise probability  and to strategic uncertainty .

    For example, given a series of two-player games , there's no natural way to combine them into a single two-player game  because  isn't a commutative monad.

    More generally, there's no commutative monad which contains both a  operator and a  operator without conflating them. See here for details.

New to LessWrong?

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 11:08 AM
[-]DragonGod3moΩ120

i.e. if each forecaster  has an first-order belief , and  is your second-order belief about which forecaster is correct, then  should be your first-order belief about the election.

I think there might be a typo here. Did you instead mean to write: "" for the second order beliefs about the forecasters?

Kosoy's infrabayesian monad  is given by 

There are a few different varieties of infrabayesian belief-state, but I currently favour the one which is called "homogeneous ultracontributions", which is "non-empty topologically-closed ⊥–closed convex sets of subdistributions", thus almost exactly the same as Mio-Sarkis-Vignudelli's "non-empty finitely-generated ⊥–closed convex sets of subdistributions monad" (Definition 36 of this paper), with the difference being essentially that it's presentable, but it's much more like  than .

I am not at all convinced by the interpretation of  here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element  in  is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations. This is very useful for modelling Bayesian updates (Evidential Decision Theory via Partial Markov Categories, sections 3.5-3.6), in which some variable  is observed to satisfy a certain predicate : this can be modelled by applying the predicate in the form  where  means the predicate is false, and   means it is true. But I don't think there is a dual to logical inconsistency, other than the full set of all possible subdistributions on the state space. It is certainly not the same type of "failure" as losing a game.

For the sake of potential readers, a (full) distribution over  is some  with finite support and , whereas a subdistribution over  is some  with finite support and . Note that a subdistribution  over  is equivalent to a full distribution over , where  is the disjoint union of  with some additional element, so the subdistribution monad can be written .

I am not at all convinced by the interpretation of  here as terminating a game with a reward for the adversary or the agent. My interpretation of the distinguished element  in  is not that it represents a special state in which the game is over, but rather a special state in which there is a contradiction between some of one's assumptions/observations.

Doesn't the Nirvana Trick basically say that these two interpretations are equivalent?

Let  be  and let  be . We can interpret  as possibility,  as a hypothesis consistent with no observations, and  as a hypothesis consistent with all observations.

Alternatively, we can interpret  as the free choice made by an adversary,  as "the game terminates and our agent receives minimal disutility", and  as "the game terminates and our agent receives maximal disutility". These two interpretations are algebraically equivalent, i.e.  is a topped and bottomed semilattice.

Unless I'm mistaken, both  and  demand that the agent may have the hypothesis "I am certain that I will receive minimal disutility", which is necessary for the Nirvana Trick. But  also demands that the agent may have the hypothesis "I am certain that I will receive maximal disutility". The first gives bounded infrabayesian monad and the second gives unbounded infrabayesian monad. Note that Diffractor uses  in Infra-Miscellanea Section 2.

I agree that each of and has two algebraically equivalent interpretations, as you say, where one is about inconsistency and the other is about inferiority for the adversary. (I hadn’t noticed that).

The variant still seems somewhat irregular to me; even though Diffractor does use it in Infra-Miscellanea Section 2, I wouldn’t select it as “the” infrabayesian monad. I’m also confused about which one you’re calling unbounded. It seems to me like the variant is bounded (on both sides) whereas the variant is bounded on one side, and neither is really unbounded. (Being bounded on at least one side is of course necessary for being consistent with infinite ethics.)

Does this article have any practical significance, or is it all just abstract nonsense? How does this help us solve the Big Problem? To be perfectly frank, I have no idea. Timelines are probably too short agent foundations, and this article is maybe agent foundations foundations...

I do think this is highly practically relevant, not least of which because using an infrabayesian monad instead of the distribution monad can provide the necessary kind of epistemic conservatism for practical safety verification in complex cyber-physical systems like the biosphere being protected and the cybersphere being monitored. It also helps remove instrumentally convergent perverse incentives to control everything.

Meyer's

If this is David Jaz Myers, it should be "Myers' thesis", here and elsewhere