Math prerequisites for this post: A Whole Lot of topology, some poset/lattice knowledge, have read "Basic Inframeasure Theory", just enough category theory to be familiar with functors, and other minor bits and pieces.
Domain theory is the branch of math that's like "so, what sort of topological space is a Haskell type, anyways? How do we provide a semantics for computation?"
And Inframeasure theory is about a generalization of measures which seems to hold promise for addressing several alignment issues.
Accordingly, it would be nifty if inframeasures were the sort of thing that could be worked with in Haskell. I don't know Haskell. But I do know a decent chunk of domain theory, so I can at least check how well inframeasures interface with that branch of math, as a proxy for actual computer implementation.
The major implications from this post are:
1: Yes, inframeasures do interface with domain theory quite nicely, and land in the part of it that might be feasible to implement on a computer (the ω-BC domains)
2: You can have inframeasure-like uncertainty over any Haskell type. This took some substantial generalization work, because Haskell types, as topological spaces, are very very different from the sorts of topological spaces we were working with before. Perfect for representing uncertainty about computations!
3: "take the space of inframeasures over this other type" is a well-behaved type constructor, so whatever crazy self-referential type signature you want to whip up with it (and the other usual type constructors), it'll work perfectly well. Oh, it's also a monad.
This post was cranked out in minimal time without editing, feel free to get in contact with me if there's something that needs clarification (there are probably things which need clarification).
Notation Conventions:
This section is best to refer back to if you're like "I know that symbol was defined somewhere, but it's hard to find where". This mostly just covers the stuff introduced at one point in this post and repeatedly reused. If a symbol gets introduced for a paragraph and doesn't get reused, it won't show up here.
For ordering, I reserve ≥ and ≤ for comparing numbers in R (or [0,1] or [−∞,∞] or (−∞,∞]).
For orderings on posets which are not the set of numbers, I use ⊑. ⊓ is infinima, ⊔ is suprema. ⨆↑ is supremum of a directed set. As is usual, ⊤ and ⊥ are the top and bottom elements of a poset if they exist.
≪ is used for the approximation relation on a poset. x≪y if all directed sets with a supremum of y or more contain an element that lies above x. More on this concept later. It's like a more restrictive version of ⊑.
D,E are used for domains or posets, while X and Y are used for generic topological spaces that may not be equipped with a partial order.
For type signatures, f:X→Y is a function from the space X to the space Y, assumed continuous by default unless we specifically say otherwise. If we say "f is a measurable function X→Y" or something like that, it might not be continuous, but if it's not otherwise specified, always assume continuity.
(X→Y) is the space of continuous functions from X to Y. This is often used for something like (X→R) and (X→[0,1]), where we're equipping R and [0,1] with the usual topology.
The notation [D→E], or [X→[0,1]], or stuff like that, has the brackets meaning we're treating the function space in a specifically domain-theoretic way. The target space E or [0,1] or (−∞,∞] or whatever must be equipped with a partial order. There's a topology on posets called the Scott-topology, the target space is equipped with it. The Scott-topology on [0,1] and the ordinary topology on [0,1] are different (same goes for R and other intervals like that) so elements of [X→R] may not be elements of (X→R), because R is equipped with different topologies in the two cases, altering the notion of "continuous function". Also, the function space is equipped with a partial order. Since the target space E or [0,1] or whatever is a poset, the functions get the ordering f′⊒f iff ∀x:f′(x)⊒f(x).
So, to recap, the brackets mean we're equipping the target space with a partial order and the Scott-topology, and equipping the space of continuous functions with a partial order and the Scott-topology as well.
For numbers, p is generally used for numbers in [0,1], q is generally used for an arbitrary real number, and a is used for nonnegative real numbers.
cq is the constant function which maps every input to the number q.
There's a part where we use P and P. The slightly fancier P is a subset of a domain D, the simpler P is the subset equipped with the order inherited from D to make it into its own distinct poset.
ψ is used for inframeasures, interpreted as a function. They've got type signatures like (X→R)→R, or (X→[0,1])→[0,1] (usual topology on the numbers), or [X→(−∞,∞]]→[−∞,∞], or [X→[0,1]]→[0,1] (Scott-topology on the numbers), varying by context. They've also got some other properties besides that.
So what is an Inframeasure?
Most of the motivation for why these things are important is contained in the Introduction to this Sequence. So we'll just focus on what sort of creature an inframeasure is, mathematically, trying to emphasize the aspects of it which generalize to our new setting.
For probability distributions μ over some space X, the expectation of a function f:X→R is defined as:
Eμ[f]:=∫fdμ
There's an alternate way of thinking of probability distributions where the most important attribute of a probability distribution is that it's the sort of thingy that you can take expectations of functions with. Under this view, a probability distribution is just a functional of type (X→R)→R, mapping a function to its expectation. Said functional must fulfill the following properties, and then it can be called a probability distribution.
1: It must be monotone. Ie, larger functions must be mapped to higher expected values. The order on functions is that f′ is larger than f iff ∀x:f′(x)≥f(x).
2: It must be linear.
3: It must map the constant-1 function to 1.
And, as it turns out, monotone linear functionals which map the constant-1 function to 1 are exactly those which can be written as taking an expectation w.r.t. a probability distribution. If you wanted to generalize to measures instead of probability distributions, just drop that third condition and keep the first two.
Now, in order to build up to inframeasures, we'll need to generalize to a-measures (short for "affine measures"). If you don't do this, you lose the ability to deal with updates in the infra-Bayes setting, which critically rely on this. An a-measure is a pair of a measure and a nonnegative number, (m,b). That b term is like "guaranteed utility". Again, we can take expectations w.r.t. them, defined as
E(m,b)[f]:=∫fdm+b
Said expectation functional (X→R)→R has the properties:
1: It's monotone.
2: It's affine. For p∈[0,1], E(m,b)[pf+(1−p)f′]=pE(m,b)[f]+(1−p)E(m,b)[f′] All affine functions to R look like a linear function to R plus a constant.
3: It must map the constant-0 function to 0 or more.
And, lo and behold, all expectation functionals of this form (skipping over some fiddly issues) can be identified with an a-measure.
Inframeasures are sets of a-measures. Sets of probability distributions (which, in our terminology, are called "crisp infradistributions") are already extensively studied in the field of Imprecise Probability, but generalizing to sets of a-measures is necessary for maximum generality and properly capturing updates. We can define expectations w.r.t these sets of a-measures. You don't know what a-measure (or probability distribution, if you prefer to think of sets of probability distributions) reality is going to pick from its set of available options Ψ, and you want to be prepared for the worst case, so expectations w.r.t. a set of a-measures are defined as:
EΨ[f]:=inf(m,b)∈Ψ∫fdm+b
Just take the worst-case expectation w.r.t. the available set of a-measures.
Taking the view where the expectation functional of type (X→R)→R is the most fundamental object, several different sets of a-measures might have the exact same expectations, so you get an equivalence relation on sets of a-measures. Usually, a set of a-measures must fulfill certain properties like closure and convexity to be called an "inframeasure", but the bulk of those properties are actually the condition for a set of a-measures to be the largest set in its equivalence class, the "canonical representative" of the equivalence class.
So, for thinking of an inframeasure, you can think of it as a equivalence class of sets of a-measures. Or you can think of it as the "canonical representative", the largest set of a-measures in its equivalence class. Or you can completely neglect the view of it as a set of a-measures and just consider the expectation functional, of type (X→R)→R. That last view, where we just view an inframeasure as a special sort of function of that type signature, will be used through the rest of this post as it generalizes the best. The expectation functional for an inframeasure, denoted by ψ, is:
1: Monotone (as usual)
2: Concave. If p∈[0,1], then pψ(f)+(1−p)ψ(f′)≤ψ(pf+(1−p)f′)
3: Lipschitz. There's some finite constant λ⊙ s.t, for all f,f′, |ψ(f)−ψ(f′)|supx|f(x)−f′(x)|≤λ⊙
4: The constant-0 function is mapped to 0 or higher.
And this is an inframeasure. Any functional of type (X→R)→R, or (X→[0,1])→[0,1], fulfilling those four properties. If you want to add extra properties to this list to get a more restricted class of these things, you can. They seem to be highly useful for dealing with nonrealizable environments, getting nice decision-theoretic behavior for free, and a few other things.
Well, it's a little more complicated than that. Some complications are:
So far, we've only been able to define them for spaces X which are Polish (a space is Polish if it can be equipped with a metric, is closed under taking limits, and there's a dense countable subset).
They can only take the expectations of continuous bounded functions X→R, unlike probability distributions which can handle measurable functions in general.
There's a complicated fifth condition related to compactness which shows up if your space X isn't compact.
We'll be neglecting this more complex stuff for now.
So what about Domain Theory?
Domain theory focuses on partially ordered sets with the ordering ⊑ being related to information content somehow, which is usually tied to computation in lazy programming languages or partial functions or stuff like that. The least informative points go on the bottom, the most informative stuff go on the top. If your poset fulfills enough nice properties, it's called a domain. Haskell types are domains.
If you're wondering about the properties to call something a domain, there's three of them, by my count. Different authors disagree on what the word "domain" refers to, but I'll go with the three most basic conditions which kept showing up over and over and over again in my domain theory textbook, and which were almost always assumed.
Condition 1: Suprema exist for all directed sets.
A directed set is a generalization of a chain. Using D for our poset, and A for a subset of it, A is a directed set when, for any two points in A, there's an upper bound of those two points which is also in A. If you want, you can just think "chain" whenever you see "directed set", the intuition won't lead you astray. So, this is essentially saying that if you've got a sequence of more-and-more-informative points, there must be a completion of that sequence. A most informative point that contains all the information in said sequence, and nothing else. Supremum for directed sets is denoted ⨆↑ instead of ⨆, because it's handy to notate when you're taking a suprema which is actually guaranteed to exist.
Condition 2: There must be a bottom point below everything.
Compare this with how, in Haskell, all types contain ⊥. It's usually used to represent nonterminating computations or undefinedness or the least informative thing. It's also handy for defining partial functions. Just map every input without a well-defined output to ⊥.
Condition 3: The poset must be continuous.
This one is rather complicated. Along with the usual information ordering, x⊑y, there's also an approximation relation, x≪y (x approximates y), which is stricter than ⊑. x≪y iff all directed sets with a supremum of y or higher contain an element z s.t x⊑z. Put another way, x≪y if x is an irreducible component of the information in y. If x≪y, then it's impossible to make a sequence of more and more information with y (or something more informative) as a limit, without having that sequence pass by/get more informative than x at some point.
A poset D is continuous if, for all y, y is the supremum of the approximants to y.
∀y∈D:y=⨆↑{x∈D|x≪y}
Pretty much, all pieces of information y should be capable of being created by combining the "substantially less informative" pieces of information which are guaranteed to show up in any attempt to build y from below (the approximants to y). You need this for the topology on your poset to behave nicely.
So, if your poset has got suprema of directed sets (chains), a bottom point, and you can build every point with approximants from below, it's a domain.
Exercise: Why is the following poset not continuous?
Haskell Example
For Haskell semantics, points near the bottom of the domain/type are computations with a lot of unevaluated thunks, basically parts of the computation that haven't finished computing yet. Maximal points in the domain are completed computations, fully specified pieces of data. If you evaluate a thunk, finding out some more about what your piece of data is, you get a higher point in the information ordering. And there's always a bottom.
As an example, the type/domain of infinite bitstrings, under the information ordering, looks like an infinite binary tree.
There's ⊥ as the root node, representing a nonterminating computation where you can't get even a single digit out of it, complete ignorance of what the bitstring is. There's a bunch of branch nodes that are of the form "string of finitely many digits, then a thunk". These elements are like "ok, I can tell you the first six digits of the binary string, but hang on I'm still computing the rest...". And then there's all the leaf nodes, fully computed infinite bitstrings.
Functions in Domain Theory
For a function from a domain to a domain to be continuous, it needs two properties. First, it needs to be monotone. x⊑y→f(x)⊑f(y). More informative inputs lead to more informative outputs.
Second, it must preserve suprema of directed sets/chains. f(⨆↑A)=⨆↑a∈Af(a) for all directed sets A. See the picture for a non-continuous function.
Btw, terminology note for later. [D→E] or [X→[0,1]] or something like that signifies the space of continuous functions from one space to another, but treated in a domain-theory way. The brackets specify we're doing this. More precisely, the target space (E or [0,1] or whatever) must be a poset, and equipped with the Scott-topology (a topology you can put on posets). This tells you what the continuous functions are, and then you equip the continuous functions with the ordering f′⊒f iff ∀x∈D:f′(x)⊒f(x).
This is relevant because the space of continuous functions (X→[0,1]) is not the space of continuous functions [X→[0,1]]. In the former, we're equipping [0,1] with its usual topology, determining what the continuous functions are. In the latter, we're equipping [0,1] with the Scott-topology, so different functions count as continuous now. And we're also going on to equip the function space with a partial order.
Type Constructor Closure
One of the most striking features of domains is that certain subcategories of domains are very good at being closed under type constructors. Some type constructors are +,×, and function space, letting you paste together existing spaces into more complex spaces. Pretty much any topological property worth its salt will be closed under finite products and sums and some other stuff, but function space is way harder to get.
Given spaces X and Y with some nice topological property, the space of continuous functions (X→Y)very rarely has said nice property. If X and Y are compact, you can't guarantee the function space is compact. If X and Y are Polish, you can't guarantee that the function space is Polish. But, given domains D and E in some "nice" category of domains, the domain of continuous functions [D→E] is going to be just as "nice".
The most important operation that the "nice" categories of domain are closed under is the bilimit. In domain theory, the category-theoretic notion of limit and colimit coincide (well, kind of, it's complicated), and the resulting thing is called the bilimit. It's how you take a sequence of ever-more-complicated domains and complete the sequence, in a sense. And it's the construction underlying Haskell's ability to define a type in terms of itself. It's very special because it means that, with the tools of domain theory, it's possible to cook up whatever sort of wacky fixtype you want.
We'll work through a very simple example of how this works. The construction process to solve D≅D+D. Stage 1 is starting with a single point. Addition of two domains/posets is "add a single new bottom element below both existing posets", so stage 2 would be 1+1, the poset that looks like a V, the domain of booleans. Stage 3 would be V+V, or a 2-level binary tree with 4 leaves. Stage 4 would be a 3-level binary tree with 8 leaves (two copies of stage 3, with a new root node). And if you take the bilimit of that sequence, you'd end up with the infinite binary tree, the domain of infinite bitstrings. And clearly, if you take two copies of an infinite binary tree, and add a new root node below both of them, you get an infinite binary tree again, so it's isomorphic to adding two copies of itself together.
The general way that domain theory makes fixtypes is starting with a single point at stage 1. Then, substitute the single point into the defining equation, to get a more complicated domain at stage 2. Then substitute the stage 2 domain into the defining equation, to get a more complicated domain at stage 3. There's a sequence of embeddings of each stage into the next, more complex stage, so you can take a bilimit of the sequence, and bam, you have your fixtype.
Exercise: Sum, D+E, is "add a new bottom point below both posets". Coalesced sum, D⊕E, is "glue both posets together at their bottom point". Lifting, D⊥, is "introduce one new bottom point". 1 is the domain of a single point, and 2 is the domain of two points, a top and a bottom. Using these type constructors, how would you define the following domains in terms of themselves in such a way that the fixtype construction process makes the indicated poset? Like how the domain of infinite bitstrings/the infinite binary tree is the solution to D≅D+D.
We aren't just limited to the sum types, there's also products and function spaces. For instance, the solution to D≅[D→D]⊥ (take the function space from itself to itself, add a new bottom point) is the space of terms in the lazy lambda calculus.
If, say, you wanted to whip up a fixtype for
D≅[D→D]+E
E≅[E×E→D]×2
Or even something much crazier than that, domain theory handles it with no problem. Start with D,E being single points, substitute them in to get stage 2 domains for both of them, working in this way you can build a sequence of more-and-more-detailed D's and E's, take the bilimit of both sequences, and you have a simplest solution!
Putting It Together
Inframeasures also have something that could be described as an information ordering. Restricting to just crisp infradistributions/closed convex sets of probability distributions for the time being, the maximal elements with the most information would be individual probability distributions. Elements further down are larger sets of probability distributions, representing more and more uncertainty about what probability distribution accurately captures events. And the bottom node would be complete uncertainty about what probability distribution describes reality, ie, the set of all probability distributions over your space.
So, that's the first inkling that something's up. Inframeasures naturally make a poset, unlike probability distributions. Suprema (for crisp infradistributions) would be intersection of the sets of probability distributions. Infinima would be closed convex hull of the union of the sets of probability distributions. Using ψ for the expectation functionals, we have
ψ′⊒ψ↔∀f:ψ′(f)≥ψ(f)
(⊒ is used for the ordering on domains, ≥ is used specifically when we're dealing with numbers)
This would line up with the type signature for inframeasures (in domain theory) being something like the space [[X→R]→R]. Though this particular type signature won't work, as we'll see later. The monotonicity property (bigger functions get bigger expectations) on inframeasures would fall out for free because continuous functions from a poset to a poset in domain theory are always monotone. And suprema/infinima of inframeasures, from Less Basic Inframeasure Theory, seemed to behave suspiciously like suprema and infinima in domain theory.
There is a very important note here. The information ordering on inframeasures is flipped from the usual ordering inframeasures are equipped with. The usual ordering has an equivalence between: Infinimum (∧), logical and (also denoted by ∧), and set intersection ∩ (of the corresponding sets of probability distributions). And there's an equivalence between supremum (∨), logical or (also ∨), and convex hull of set union ∪.
But, under the information ordering, set intersection would be supremum. The sets get smaller and thus have less uncertainty and more information. Accordingly, the notation ⊑ will be reserved for the information ordering, and ⊔,⊓ for domain-theory suprema and infinima (intersection and union), respectively. This is so ⪯,∧ and ∨ can be reserved for the usual ordering, avoiding the notational nightmare of using the same symbol for both the ordering and the reverse ordering.
It's very boring to just go "ooh, infradistributions make a domain!", though. You can't get much out of that. The ultimate goal would be to figure out how to take a domain D, and make a domain of inframeasures over the space D.
If you did this cleanly enough, then "map a type D to the type of inframeasures over D" would end up being a nicely-behaved type constructor in domain theory, and you could whip up whatever sort of weird fixtypes you want with it, and probably cram it all into Haskell somehow, and have inframeasure-y uncertainty over any Haskell type.
This is a big deal because it's a surprisingly big pain in the ass to make the "map a type D to the type of probability distributions over D" type constructor behave sensibly.
There is a domain-theory construction which builds the space of probability distributions over another domain. It's called the probabilistic powerdomain. The issue is, probabilistic powerdomain is an operation which leads you outside of the most commonly used cartesian-closed categories of domains (which you need to stay within to cook up the fixtype solutions), and the cartesian-closed categories of domains are exhaustively classified. The problem of showing whether or not there's some lesser-used category of domains that's closed under probabilistic powerdomain is called the Jung-Tix problem, after the authors of the original paper.
There's a generalization of domains called QCB spaces, and apparently a sub-category of them is closed under probabilistic powerdomain and has strong enough closure properties to support the fixtype construction process. So domain theory (broadly construed) has pretty much figured out how to handle probabilistic choice in computations since the original Jung-Tix paper was written. But you have to go rather far afield to do it.
The key problem with probabilistic powerdomain is that it destroys all the lattice-like structure present in a domain, and the best-behaved categories of domains with the strongest connections to computation are very lattice-like. Fortunately, the space of inframeasures does have a lot of lattice-like structure which is lacked by the space of measures. So, one might hope that one of the nice and commonly used categories of domain is closed under the "space of inframeasures" type constructor, for practical implementation.
As it turns out, yes, this can be done. There's problems to overcome along the way, though.
Problems and Plans
Problem 1: The type signature of an inframeasure over X, in a domain-theory sense, would be something like an element of the space [[X→[0,1]]→[0,1]] or [[X→R]→R]. Up till now, inframeasures could only take expectations of bounded continuous functions X→R (or X→[0,1]), where R or [0,1] is equipped with the usual topology. Domain theory would equip [0,1] or R with the Scott-topology when making the function space, thus changing the notion of "continuous function". And also, there's no good support for the notion of "bounded function" in domain theory. And also, R isn't a legit domain. Closure under suprema of chains demands that ∞ be added, and the existence of a bottom element demands that −∞ be added. We need to sort out fiddly issues of what sorts of functions we can even take the expectation of in the first place.
Problem 2: Inframeasures have only been defined on Polish spaces up till now. Polish spaces are the usual setting for probability theory. And domains are very very not-Polish. They're T0 spaces, about as badly behaved as a topological space can be, while Polish spaces are T6, about as nicely behaved as a topological space can be. This is going to take some serious generalization work.
Problem 3: Not all the elements of that function space are suitable to be an inframeasure. Inframeasures are functionals which have some additional properties like concavity, which wouldn't show up for an arbitrary continuous function. If we wanted to build a space of inframeasures and show it was nicely behaved from a domain-theory standpoint, we'd have to craft it as some sort of subset or quotient of the full function space, which is more finicky to pull off than it appears at first.
Problem 4: There's a lot of extra properties inframeasures can have, if you want to restrict to particularly nicely-behaved subsets of them. We could restrict our attention to infradistributions (which must map the constant-0 function to 0 and constant-1 function to 1). And there's also properties you can stack on top of that like 1-Lipschitzness, homogenity, C-additivity, cohomogenity, crispness, and sharpness (definitions not relevant here, they're present in Less Basic Inframeasure Theory). We could just as well imagine building domains corresponding to those, to make type constructors like "build the type of crisp infradistributions over this other type". Plus, there's the issue of whether we're using the [[X→[0,1]]→[0,1]] space, or the [[X→R]→R] space. We really don't want to have to duplicate all the work ten times over in proving that these more restricted processes all make nice domains.
The game plan here is:
Step 1: Inframeasures take as input continuous bounded functions in X→[0,1] (or R) where [0,1] has the usual topology. The domain-theory space of continuous functions [X→[0,1]] (or R), has [0,1] or R equipped with the Scott-topology, and doesn't talk about boundedness. This is quite different! Sort that part out somehow, to address problem 1.
Step 2: Conjecture what sort of domain a domain of inframeasures would be, so we know which category of domains we're headed for. It should be one of the smaller categories of domains that's more closely linked to what computers work on. Let's call these "nice" domains, as a placeholder.
Step 3: Come up with some sort of criteria for a "good" topological space to define inframeasures over that manages to include both the usual sorts of Polish spaces we were looking at before, and "nice" domains, so you can apply inframeasures to domains.
Step 4: Verify we solved steps 2 and 3 correctly by showing that if X is "good", then [X→[0,1]] is a "nice" domain. Then, since, by step 3, a "nice" domain should be a "good" space, we can apply our result a second time to conclude that [[X→[0,1]]→[0,1]] is a "nice" domain. Where "nice" and "good" are placeholders for something more formal. Also, generalize this to the R type signature. Steps 3 and 4 together would address problem 2, thus extending the notion of "inframeasure" to the sorts of objects studied in domain theory.
Step 5: Now that [[X→[0,1]]→[0,1]] (or the same thing with R) is a "nice" domain, see if we can get some sort of master theorem for precisely when a subset of a "nice" domain is also a "nice" domain. All the various sorts of inframeasures and infradistributions are subsets of that whole function space, so it'd be nice to only check the conditions of the master theorem to go "and this subset of it is also a "nice" domain" instead of having to do all the work from scratch each time. This would address problem 4 by making one single theorem we can reuse.
Step 6: Show that for inframeasures, and all the more specialized types of them, they all fulfill the conditions from the master theorem in step 5, so they all make "nice" domains. This would solve problem 3, the last of them.
By now, steps 1-6 show that the category of "nice" domains is closed under the "take the space of inframeasures" type constructor, and all the more restrictive variants of it like "take the space of crisp infradistributions". This is because, by step 3, all "nice" domains should be "good" spaces. By step 4, the function space [[D→[0,1]]→[0,1]] (or the R variant) will also be "nice". And then, by steps 5 and 6, the various subsets of the full function space fulfilling various desired properties are also "nice". So we're just on detail cleanup from here on.
Step 7: Do a quick routine verification that "take the space of inframeasures" is a locally continuous functor from the category of "nice" domains to itself, as that's the property you have to check to freely use it as a type constructor in fixtype shenanigans with no worries. Also, annoyingly enough, my domain theory textbook doesn't verify closure of the various categories of domains under all the other usual type constructors that made an appearance there (lifting, sum, coalesced sum, product, smash product, function space, strict function space, bilimit), only some of them, so get those squared away.
And then we'd be done! By step 7 (the category of "nice" domains is closed under all these type constructors), we can just throw chapter 5 of the Abramsky textbook at it to get that we can build whatever weird type signature we want from these basic building blocks and it'll work out.
Of course, not all these steps will interest the reader, and we'll be going into an awful lot of detail on how each one works, so if you've made it this far, feel free to skip straight to whichever one piques your interest.
Step 1: Fix function problems
Inframeasures are defined over Polish spaces with no issues, so let's start there. Our goal is to extend inframeasures over Polish spaces X from taking expectations of bounded continuous functions X→[0,1] (or R) to taking expectations of Scott-continuous functions X→[0,1] (or R).
The Scott-topology opens on [0,1] are (besides the whole space and the empty set) all of the form (q,1] for q∈[0,1), the sets of the form "everything strictly above this number". Similar thing for R. This is fewer opens than the usual topology on [0,1] (or R), so it's easier for a function X→[0,1] to count as continuous, since the criteria "every preimage of an open is open" is easier to fulfill when you've got less opens in the target space.
Going to probability distributions as a toy example to play with, if you had previously invented probability distributions in such a way that you could only take expectations of bounded continuous functions with them, and were trying to extend the expectations to more general functions than that, how could it be done?
Well, for probability distributions, there's a nifty result called Lebesgue's Dominated Convergence Theorem which (skipping over technical requirements) says that if a sequence of measurable functions fn:X→R limits pointwise to f∗:X→R, then
limn→∞Eμ(fn)=Eμ(f)
So, if you could prove a result like the Dominated Convergence Theorem, then you could go "hey, we can get sensible expectations for all functions which are pointwise limits of continuous functions, since no matter what sequence of functions you pick to get there, the limits of the expectation values will agree!". And then you'd have expectations for the Baire class 1 functions (pointwise limits of continous functions). Apply the theorem again and you get Baire class 2 functions (pointwise limits of Baire class 1 functions). Keep going, take unions when you get to limit ordinals, and by the time you've hit the ordinal ω1, you're done, you've figured out how to take expectations of any measurable function.
Can we do something like this for inframeasures? No. There's no analogue of Lebesgue's Dominated Convergence Theorem. In fact, it explicitly fails, and not way off in the upper realms of the ordinals, it fails down at the start.
Proposition 1:There is an infradistribution Ψ on the space [0,1] and a sequence of continuous functions fn:[0,1]→[0,1] which limit pointwise to a function f∗, yet limn→∞EΨ[fn]=0 and EΨ[f∗]=1.
Proof: Let Ψ be the set of all probability distributions on [0,1]. The expectation function is f↦minx∈[0,1]f(x), because you can always consider the probability distribution which has all its measure on the point where the function is lowest.
Function fn is defined to dip down in a V pattern, from 1 at 0, to 0 at 2−n, back up to 1 at 2−n+1, and is 1 for all larger numbers. This sequence limits pointwise to the constant-1 function. Since all the fn have a minimum value of 0, they have expectation values of 0, and the constant-1 function has a minimum value of 1, so it has an expectation value of 1.
The Monotone Convergence Theorem says that if you've got a sequence of lower-bounded measurable functions fn:X→(−∞,∞] s.t. fn+1≥fn, then the pointwise limit of them, f∗:X→(−∞,∞], has
limn→∞Eμ[fn]=Eμ[f∗]
There's a bit of a roadblock though. If you start off with bounded continuous functions, then taking pointwise limits of ascending sequences like that gets you... all the lower-bounded lower-semicontinuous functions. And then, taking pointwise limits of sequences like that, you can get... lower-bounded lower-semicontinouous functions. That's as far as we can go with this result!
Also, more excitingly, we can prove another theorem that all lower-bounded lower-semicontinuous functions f∗:X→
Math prerequisites for this post: A Whole Lot of topology, some poset/lattice knowledge, have read "Basic Inframeasure Theory", just enough category theory to be familiar with functors, and other minor bits and pieces.
Domain theory is the branch of math that's like "so, what sort of topological space is a Haskell type, anyways? How do we provide a semantics for computation?"
And Inframeasure theory is about a generalization of measures which seems to hold promise for addressing several alignment issues.
Accordingly, it would be nifty if inframeasures were the sort of thing that could be worked with in Haskell. I don't know Haskell. But I do know a decent chunk of domain theory, so I can at least check how well inframeasures interface with that branch of math, as a proxy for actual computer implementation.
The major implications from this post are:
1: Yes, inframeasures do interface with domain theory quite nicely, and land in the part of it that might be feasible to implement on a computer (the ω-BC domains)
2: You can have inframeasure-like uncertainty over any Haskell type. This took some substantial generalization work, because Haskell types, as topological spaces, are very very different from the sorts of topological spaces we were working with before. Perfect for representing uncertainty about computations!
3: "take the space of inframeasures over this other type" is a well-behaved type constructor, so whatever crazy self-referential type signature you want to whip up with it (and the other usual type constructors), it'll work perfectly well. Oh, it's also a monad.
This post was cranked out in minimal time without editing, feel free to get in contact with me if there's something that needs clarification (there are probably things which need clarification).
Notation Conventions:
This section is best to refer back to if you're like "I know that symbol was defined somewhere, but it's hard to find where". This mostly just covers the stuff introduced at one point in this post and repeatedly reused. If a symbol gets introduced for a paragraph and doesn't get reused, it won't show up here.
For ordering, I reserve ≥ and ≤ for comparing numbers in R (or [0,1] or [−∞,∞] or (−∞,∞]).
For orderings on posets which are not the set of numbers, I use ⊑. ⊓ is infinima, ⊔ is suprema. ⨆↑ is supremum of a directed set. As is usual, ⊤ and ⊥ are the top and bottom elements of a poset if they exist.
≪ is used for the approximation relation on a poset. x≪y if all directed sets with a supremum of y or more contain an element that lies above x. More on this concept later. It's like a more restrictive version of ⊑.
D,E are used for domains or posets, while X and Y are used for generic topological spaces that may not be equipped with a partial order.
For type signatures, f:X→Y is a function from the space X to the space Y, assumed continuous by default unless we specifically say otherwise. If we say "f is a measurable function X→Y" or something like that, it might not be continuous, but if it's not otherwise specified, always assume continuity.
(X→Y) is the space of continuous functions from X to Y. This is often used for something like (X→R) and (X→[0,1]), where we're equipping R and [0,1] with the usual topology.
The notation [D→E], or [X→[0,1]], or stuff like that, has the brackets meaning we're treating the function space in a specifically domain-theoretic way. The target space E or [0,1] or (−∞,∞] or whatever must be equipped with a partial order. There's a topology on posets called the Scott-topology, the target space is equipped with it. The Scott-topology on [0,1] and the ordinary topology on [0,1] are different (same goes for R and other intervals like that) so elements of [X→R] may not be elements of (X→R), because R is equipped with different topologies in the two cases, altering the notion of "continuous function". Also, the function space is equipped with a partial order. Since the target space E or [0,1] or whatever is a poset, the functions get the ordering f′⊒f iff ∀x:f′(x)⊒f(x).
So, to recap, the brackets mean we're equipping the target space with a partial order and the Scott-topology, and equipping the space of continuous functions with a partial order and the Scott-topology as well.
For numbers, p is generally used for numbers in [0,1], q is generally used for an arbitrary real number, and a is used for nonnegative real numbers.
cq is the constant function which maps every input to the number q.
There's a part where we use P and P. The slightly fancier P is a subset of a domain D, the simpler P is the subset equipped with the order inherited from D to make it into its own distinct poset.
ψ is used for inframeasures, interpreted as a function. They've got type signatures like (X→R)→R, or (X→[0,1])→[0,1] (usual topology on the numbers), or [X→(−∞,∞]]→[−∞,∞], or [X→[0,1]]→[0,1] (Scott-topology on the numbers), varying by context. They've also got some other properties besides that.
So what is an Inframeasure?
Most of the motivation for why these things are important is contained in the Introduction to this Sequence. So we'll just focus on what sort of creature an inframeasure is, mathematically, trying to emphasize the aspects of it which generalize to our new setting.
For probability distributions μ over some space X, the expectation of a function f:X→R is defined as:
Eμ[f]:=∫fdμ
There's an alternate way of thinking of probability distributions where the most important attribute of a probability distribution is that it's the sort of thingy that you can take expectations of functions with. Under this view, a probability distribution is just a functional of type (X→R)→R, mapping a function to its expectation. Said functional must fulfill the following properties, and then it can be called a probability distribution.
1: It must be monotone. Ie, larger functions must be mapped to higher expected values. The order on functions is that f′ is larger than f iff ∀x:f′(x)≥f(x).
2: It must be linear.
3: It must map the constant-1 function to 1.
And, as it turns out, monotone linear functionals which map the constant-1 function to 1 are exactly those which can be written as taking an expectation w.r.t. a probability distribution. If you wanted to generalize to measures instead of probability distributions, just drop that third condition and keep the first two.
Now, in order to build up to inframeasures, we'll need to generalize to a-measures (short for "affine measures"). If you don't do this, you lose the ability to deal with updates in the infra-Bayes setting, which critically rely on this. An a-measure is a pair of a measure and a nonnegative number, (m,b). That b term is like "guaranteed utility". Again, we can take expectations w.r.t. them, defined as
E(m,b)[f]:=∫fdm+b
Said expectation functional (X→R)→R has the properties:
1: It's monotone.
2: It's affine. For p∈[0,1], E(m,b)[pf+(1−p)f′]=pE(m,b)[f]+(1−p)E(m,b)[f′]
All affine functions to R look like a linear function to R plus a constant.
3: It must map the constant-0 function to 0 or more.
And, lo and behold, all expectation functionals of this form (skipping over some fiddly issues) can be identified with an a-measure.
Inframeasures are sets of a-measures. Sets of probability distributions (which, in our terminology, are called "crisp infradistributions") are already extensively studied in the field of Imprecise Probability, but generalizing to sets of a-measures is necessary for maximum generality and properly capturing updates. We can define expectations w.r.t these sets of a-measures. You don't know what a-measure (or probability distribution, if you prefer to think of sets of probability distributions) reality is going to pick from its set of available options Ψ, and you want to be prepared for the worst case, so expectations w.r.t. a set of a-measures are defined as:
EΨ[f]:=inf(m,b)∈Ψ∫fdm+b
Just take the worst-case expectation w.r.t. the available set of a-measures.
Taking the view where the expectation functional of type (X→R)→R is the most fundamental object, several different sets of a-measures might have the exact same expectations, so you get an equivalence relation on sets of a-measures. Usually, a set of a-measures must fulfill certain properties like closure and convexity to be called an "inframeasure", but the bulk of those properties are actually the condition for a set of a-measures to be the largest set in its equivalence class, the "canonical representative" of the equivalence class.
So, for thinking of an inframeasure, you can think of it as a equivalence class of sets of a-measures. Or you can think of it as the "canonical representative", the largest set of a-measures in its equivalence class. Or you can completely neglect the view of it as a set of a-measures and just consider the expectation functional, of type (X→R)→R. That last view, where we just view an inframeasure as a special sort of function of that type signature, will be used through the rest of this post as it generalizes the best. The expectation functional for an inframeasure, denoted by ψ, is:
1: Monotone (as usual)
2: Concave. If p∈[0,1], then pψ(f)+(1−p)ψ(f′)≤ψ(pf+(1−p)f′)
3: Lipschitz. There's some finite constant λ⊙ s.t, for all f,f′, |ψ(f)−ψ(f′)|supx|f(x)−f′(x)|≤λ⊙
4: The constant-0 function is mapped to 0 or higher.
And this is an inframeasure. Any functional of type (X→R)→R, or (X→[0,1])→[0,1], fulfilling those four properties. If you want to add extra properties to this list to get a more restricted class of these things, you can. They seem to be highly useful for dealing with nonrealizable environments, getting nice decision-theoretic behavior for free, and a few other things.
Well, it's a little more complicated than that. Some complications are:
So far, we've only been able to define them for spaces X which are Polish (a space is Polish if it can be equipped with a metric, is closed under taking limits, and there's a dense countable subset).
They can only take the expectations of continuous bounded functions X→R, unlike probability distributions which can handle measurable functions in general.
There's a complicated fifth condition related to compactness which shows up if your space X isn't compact.
We'll be neglecting this more complex stuff for now.
So what about Domain Theory?
Domain theory focuses on partially ordered sets with the ordering ⊑ being related to information content somehow, which is usually tied to computation in lazy programming languages or partial functions or stuff like that. The least informative points go on the bottom, the most informative stuff go on the top. If your poset fulfills enough nice properties, it's called a domain. Haskell types are domains.
If you're wondering about the properties to call something a domain, there's three of them, by my count. Different authors disagree on what the word "domain" refers to, but I'll go with the three most basic conditions which kept showing up over and over and over again in my domain theory textbook, and which were almost always assumed.
Condition 1: Suprema exist for all directed sets.
A directed set is a generalization of a chain. Using D for our poset, and A for a subset of it, A is a directed set when, for any two points in A, there's an upper bound of those two points which is also in A. If you want, you can just think "chain" whenever you see "directed set", the intuition won't lead you astray. So, this is essentially saying that if you've got a sequence of more-and-more-informative points, there must be a completion of that sequence. A most informative point that contains all the information in said sequence, and nothing else. Supremum for directed sets is denoted ⨆↑ instead of ⨆, because it's handy to notate when you're taking a suprema which is actually guaranteed to exist.
Condition 2: There must be a bottom point below everything.
Compare this with how, in Haskell, all types contain ⊥. It's usually used to represent nonterminating computations or undefinedness or the least informative thing. It's also handy for defining partial functions. Just map every input without a well-defined output to ⊥.
Condition 3: The poset must be continuous.
This one is rather complicated. Along with the usual information ordering, x⊑y, there's also an approximation relation, x≪y (x approximates y), which is stricter than ⊑. x≪y iff all directed sets with a supremum of y or higher contain an element z s.t x⊑z. Put another way, x≪y if x is an irreducible component of the information in y. If x≪y, then it's impossible to make a sequence of more and more information with y (or something more informative) as a limit, without having that sequence pass by/get more informative than x at some point.
A poset D is continuous if, for all y, y is the supremum of the approximants to y.
∀y∈D:y=⨆↑{x∈D|x≪y}
Pretty much, all pieces of information y should be capable of being created by combining the "substantially less informative" pieces of information which are guaranteed to show up in any attempt to build y from below (the approximants to y). You need this for the topology on your poset to behave nicely.
So, if your poset has got suprema of directed sets (chains), a bottom point, and you can build every point with approximants from below, it's a domain.
Exercise: Why is the following poset not continuous?
Haskell Example
For Haskell semantics, points near the bottom of the domain/type are computations with a lot of unevaluated thunks, basically parts of the computation that haven't finished computing yet. Maximal points in the domain are completed computations, fully specified pieces of data. If you evaluate a thunk, finding out some more about what your piece of data is, you get a higher point in the information ordering. And there's always a bottom.
As an example, the type/domain of infinite bitstrings, under the information ordering, looks like an infinite binary tree.
There's ⊥ as the root node, representing a nonterminating computation where you can't get even a single digit out of it, complete ignorance of what the bitstring is. There's a bunch of branch nodes that are of the form "string of finitely many digits, then a thunk". These elements are like "ok, I can tell you the first six digits of the binary string, but hang on I'm still computing the rest...". And then there's all the leaf nodes, fully computed infinite bitstrings.
Functions in Domain Theory
For a function from a domain to a domain to be continuous, it needs two properties. First, it needs to be monotone. x⊑y→f(x)⊑f(y). More informative inputs lead to more informative outputs.
Second, it must preserve suprema of directed sets/chains. f(⨆↑A)=⨆↑a∈Af(a) for all directed sets A. See the picture for a non-continuous function.
Btw, terminology note for later. [D→E] or [X→[0,1]] or something like that signifies the space of continuous functions from one space to another, but treated in a domain-theory way. The brackets specify we're doing this. More precisely, the target space (E or [0,1] or whatever) must be a poset, and equipped with the Scott-topology (a topology you can put on posets). This tells you what the continuous functions are, and then you equip the continuous functions with the ordering f′⊒f iff ∀x∈D:f′(x)⊒f(x).
This is relevant because the space of continuous functions (X→[0,1]) is not the space of continuous functions [X→[0,1]]. In the former, we're equipping [0,1] with its usual topology, determining what the continuous functions are. In the latter, we're equipping [0,1] with the Scott-topology, so different functions count as continuous now. And we're also going on to equip the function space with a partial order.
Type Constructor Closure
One of the most striking features of domains is that certain subcategories of domains are very good at being closed under type constructors. Some type constructors are +,×, and function space, letting you paste together existing spaces into more complex spaces. Pretty much any topological property worth its salt will be closed under finite products and sums and some other stuff, but function space is way harder to get.
Given spaces X and Y with some nice topological property, the space of continuous functions (X→Y) very rarely has said nice property. If X and Y are compact, you can't guarantee the function space is compact. If X and Y are Polish, you can't guarantee that the function space is Polish. But, given domains D and E in some "nice" category of domains, the domain of continuous functions [D→E] is going to be just as "nice".
The most important operation that the "nice" categories of domain are closed under is the bilimit. In domain theory, the category-theoretic notion of limit and colimit coincide (well, kind of, it's complicated), and the resulting thing is called the bilimit. It's how you take a sequence of ever-more-complicated domains and complete the sequence, in a sense. And it's the construction underlying Haskell's ability to define a type in terms of itself. It's very special because it means that, with the tools of domain theory, it's possible to cook up whatever sort of wacky fixtype you want.
We'll work through a very simple example of how this works. The construction process to solve D≅D+D. Stage 1 is starting with a single point. Addition of two domains/posets is "add a single new bottom element below both existing posets", so stage 2 would be 1+1, the poset that looks like a V, the domain of booleans. Stage 3 would be V+V, or a 2-level binary tree with 4 leaves. Stage 4 would be a 3-level binary tree with 8 leaves (two copies of stage 3, with a new root node). And if you take the bilimit of that sequence, you'd end up with the infinite binary tree, the domain of infinite bitstrings. And clearly, if you take two copies of an infinite binary tree, and add a new root node below both of them, you get an infinite binary tree again, so it's isomorphic to adding two copies of itself together.
The general way that domain theory makes fixtypes is starting with a single point at stage 1. Then, substitute the single point into the defining equation, to get a more complicated domain at stage 2. Then substitute the stage 2 domain into the defining equation, to get a more complicated domain at stage 3. There's a sequence of embeddings of each stage into the next, more complex stage, so you can take a bilimit of the sequence, and bam, you have your fixtype.
Exercise: Sum, D+E, is "add a new bottom point below both posets". Coalesced sum, D⊕E, is "glue both posets together at their bottom point". Lifting, D⊥, is "introduce one new bottom point". 1 is the domain of a single point, and 2 is the domain of two points, a top and a bottom. Using these type constructors, how would you define the following domains in terms of themselves in such a way that the fixtype construction process makes the indicated poset? Like how the domain of infinite bitstrings/the infinite binary tree is the solution to D≅D+D.
We aren't just limited to the sum types, there's also products and function spaces. For instance, the solution to D≅[D→D]⊥ (take the function space from itself to itself, add a new bottom point) is the space of terms in the lazy lambda calculus.
If, say, you wanted to whip up a fixtype for
D≅[D→D]+E
E≅[E×E→D]×2
Or even something much crazier than that, domain theory handles it with no problem. Start with D,E being single points, substitute them in to get stage 2 domains for both of them, working in this way you can build a sequence of more-and-more-detailed D's and E's, take the bilimit of both sequences, and you have a simplest solution!
Putting It Together
Inframeasures also have something that could be described as an information ordering. Restricting to just crisp infradistributions/closed convex sets of probability distributions for the time being, the maximal elements with the most information would be individual probability distributions. Elements further down are larger sets of probability distributions, representing more and more uncertainty about what probability distribution accurately captures events. And the bottom node would be complete uncertainty about what probability distribution describes reality, ie, the set of all probability distributions over your space.
So, that's the first inkling that something's up. Inframeasures naturally make a poset, unlike probability distributions. Suprema (for crisp infradistributions) would be intersection of the sets of probability distributions. Infinima would be closed convex hull of the union of the sets of probability distributions. Using ψ for the expectation functionals, we have
ψ′⊒ψ↔∀f:ψ′(f)≥ψ(f)
(⊒ is used for the ordering on domains, ≥ is used specifically when we're dealing with numbers)
This would line up with the type signature for inframeasures (in domain theory) being something like the space [[X→R]→R]. Though this particular type signature won't work, as we'll see later. The monotonicity property (bigger functions get bigger expectations) on inframeasures would fall out for free because continuous functions from a poset to a poset in domain theory are always monotone. And suprema/infinima of inframeasures, from Less Basic Inframeasure Theory, seemed to behave suspiciously like suprema and infinima in domain theory.
There is a very important note here. The information ordering on inframeasures is flipped from the usual ordering inframeasures are equipped with. The usual ordering has an equivalence between: Infinimum (∧), logical and (also denoted by ∧), and set intersection ∩ (of the corresponding sets of probability distributions). And there's an equivalence between supremum (∨), logical or (also ∨), and convex hull of set union ∪.
But, under the information ordering, set intersection would be supremum. The sets get smaller and thus have less uncertainty and more information. Accordingly, the notation ⊑ will be reserved for the information ordering, and ⊔,⊓ for domain-theory suprema and infinima (intersection and union), respectively. This is so ⪯,∧ and ∨ can be reserved for the usual ordering, avoiding the notational nightmare of using the same symbol for both the ordering and the reverse ordering.
It's very boring to just go "ooh, infradistributions make a domain!", though. You can't get much out of that. The ultimate goal would be to figure out how to take a domain D, and make a domain of inframeasures over the space D.
If you did this cleanly enough, then "map a type D to the type of inframeasures over D" would end up being a nicely-behaved type constructor in domain theory, and you could whip up whatever sort of weird fixtypes you want with it, and probably cram it all into Haskell somehow, and have inframeasure-y uncertainty over any Haskell type.
This is a big deal because it's a surprisingly big pain in the ass to make the "map a type D to the type of probability distributions over D" type constructor behave sensibly.
There is a domain-theory construction which builds the space of probability distributions over another domain. It's called the probabilistic powerdomain. The issue is, probabilistic powerdomain is an operation which leads you outside of the most commonly used cartesian-closed categories of domains (which you need to stay within to cook up the fixtype solutions), and the cartesian-closed categories of domains are exhaustively classified. The problem of showing whether or not there's some lesser-used category of domains that's closed under probabilistic powerdomain is called the Jung-Tix problem, after the authors of the original paper.
There's a generalization of domains called QCB spaces, and apparently a sub-category of them is closed under probabilistic powerdomain and has strong enough closure properties to support the fixtype construction process. So domain theory (broadly construed) has pretty much figured out how to handle probabilistic choice in computations since the original Jung-Tix paper was written. But you have to go rather far afield to do it.
The key problem with probabilistic powerdomain is that it destroys all the lattice-like structure present in a domain, and the best-behaved categories of domains with the strongest connections to computation are very lattice-like. Fortunately, the space of inframeasures does have a lot of lattice-like structure which is lacked by the space of measures. So, one might hope that one of the nice and commonly used categories of domain is closed under the "space of inframeasures" type constructor, for practical implementation.
As it turns out, yes, this can be done. There's problems to overcome along the way, though.
Problems and Plans
Problem 1: The type signature of an inframeasure over X, in a domain-theory sense, would be something like an element of the space [[X→[0,1]]→[0,1]] or [[X→R]→R]. Up till now, inframeasures could only take expectations of bounded continuous functions X→R (or X→[0,1]), where R or [0,1] is equipped with the usual topology. Domain theory would equip [0,1] or R with the Scott-topology when making the function space, thus changing the notion of "continuous function". And also, there's no good support for the notion of "bounded function" in domain theory. And also, R isn't a legit domain. Closure under suprema of chains demands that ∞ be added, and the existence of a bottom element demands that −∞ be added. We need to sort out fiddly issues of what sorts of functions we can even take the expectation of in the first place.
Problem 2: Inframeasures have only been defined on Polish spaces up till now. Polish spaces are the usual setting for probability theory. And domains are very very not-Polish. They're T0 spaces, about as badly behaved as a topological space can be, while Polish spaces are T6, about as nicely behaved as a topological space can be. This is going to take some serious generalization work.
Problem 3: Not all the elements of that function space are suitable to be an inframeasure. Inframeasures are functionals which have some additional properties like concavity, which wouldn't show up for an arbitrary continuous function. If we wanted to build a space of inframeasures and show it was nicely behaved from a domain-theory standpoint, we'd have to craft it as some sort of subset or quotient of the full function space, which is more finicky to pull off than it appears at first.
Problem 4: There's a lot of extra properties inframeasures can have, if you want to restrict to particularly nicely-behaved subsets of them. We could restrict our attention to infradistributions (which must map the constant-0 function to 0 and constant-1 function to 1). And there's also properties you can stack on top of that like 1-Lipschitzness, homogenity, C-additivity, cohomogenity, crispness, and sharpness (definitions not relevant here, they're present in Less Basic Inframeasure Theory). We could just as well imagine building domains corresponding to those, to make type constructors like "build the type of crisp infradistributions over this other type". Plus, there's the issue of whether we're using the [[X→[0,1]]→[0,1]] space, or the [[X→R]→R] space. We really don't want to have to duplicate all the work ten times over in proving that these more restricted processes all make nice domains.
The game plan here is:
Step 1: Inframeasures take as input continuous bounded functions in X→[0,1] (or R) where [0,1] has the usual topology. The domain-theory space of continuous functions [X→[0,1]] (or R), has [0,1] or R equipped with the Scott-topology, and doesn't talk about boundedness. This is quite different! Sort that part out somehow, to address problem 1.
Step 2: Conjecture what sort of domain a domain of inframeasures would be, so we know which category of domains we're headed for. It should be one of the smaller categories of domains that's more closely linked to what computers work on. Let's call these "nice" domains, as a placeholder.
Step 3: Come up with some sort of criteria for a "good" topological space to define inframeasures over that manages to include both the usual sorts of Polish spaces we were looking at before, and "nice" domains, so you can apply inframeasures to domains.
Step 4: Verify we solved steps 2 and 3 correctly by showing that if X is "good", then [X→[0,1]] is a "nice" domain. Then, since, by step 3, a "nice" domain should be a "good" space, we can apply our result a second time to conclude that [[X→[0,1]]→[0,1]] is a "nice" domain. Where "nice" and "good" are placeholders for something more formal. Also, generalize this to the R type signature. Steps 3 and 4 together would address problem 2, thus extending the notion of "inframeasure" to the sorts of objects studied in domain theory.
Step 5: Now that [[X→[0,1]]→[0,1]] (or the same thing with R) is a "nice" domain, see if we can get some sort of master theorem for precisely when a subset of a "nice" domain is also a "nice" domain. All the various sorts of inframeasures and infradistributions are subsets of that whole function space, so it'd be nice to only check the conditions of the master theorem to go "and this subset of it is also a "nice" domain" instead of having to do all the work from scratch each time. This would address problem 4 by making one single theorem we can reuse.
Step 6: Show that for inframeasures, and all the more specialized types of them, they all fulfill the conditions from the master theorem in step 5, so they all make "nice" domains. This would solve problem 3, the last of them.
By now, steps 1-6 show that the category of "nice" domains is closed under the "take the space of inframeasures" type constructor, and all the more restrictive variants of it like "take the space of crisp infradistributions". This is because, by step 3, all "nice" domains should be "good" spaces. By step 4, the function space [[D→[0,1]]→[0,1]] (or the R variant) will also be "nice". And then, by steps 5 and 6, the various subsets of the full function space fulfilling various desired properties are also "nice". So we're just on detail cleanup from here on.
Step 7: Do a quick routine verification that "take the space of inframeasures" is a locally continuous functor from the category of "nice" domains to itself, as that's the property you have to check to freely use it as a type constructor in fixtype shenanigans with no worries. Also, annoyingly enough, my domain theory textbook doesn't verify closure of the various categories of domains under all the other usual type constructors that made an appearance there (lifting, sum, coalesced sum, product, smash product, function space, strict function space, bilimit), only some of them, so get those squared away.
And then we'd be done! By step 7 (the category of "nice" domains is closed under all these type constructors), we can just throw chapter 5 of the Abramsky textbook at it to get that we can build whatever weird type signature we want from these basic building blocks and it'll work out.
Of course, not all these steps will interest the reader, and we'll be going into an awful lot of detail on how each one works, so if you've made it this far, feel free to skip straight to whichever one piques your interest.
Step 1: Fix function problems
Inframeasures are defined over Polish spaces with no issues, so let's start there. Our goal is to extend inframeasures over Polish spaces X from taking expectations of bounded continuous functions X→[0,1] (or R) to taking expectations of Scott-continuous functions X→[0,1] (or R).
The Scott-topology opens on [0,1] are (besides the whole space and the empty set) all of the form (q,1] for q∈[0,1), the sets of the form "everything strictly above this number". Similar thing for R. This is fewer opens than the usual topology on [0,1] (or R), so it's easier for a function X→[0,1] to count as continuous, since the criteria "every preimage of an open is open" is easier to fulfill when you've got less opens in the target space.
Going to probability distributions as a toy example to play with, if you had previously invented probability distributions in such a way that you could only take expectations of bounded continuous functions with them, and were trying to extend the expectations to more general functions than that, how could it be done?
Well, for probability distributions, there's a nifty result called Lebesgue's Dominated Convergence Theorem which (skipping over technical requirements) says that if a sequence of measurable functions fn:X→R limits pointwise to f∗:X→R, then
limn→∞Eμ(fn)=Eμ(f)
So, if you could prove a result like the Dominated Convergence Theorem, then you could go "hey, we can get sensible expectations for all functions which are pointwise limits of continuous functions, since no matter what sequence of functions you pick to get there, the limits of the expectation values will agree!". And then you'd have expectations for the Baire class 1 functions (pointwise limits of continous functions). Apply the theorem again and you get Baire class 2 functions (pointwise limits of Baire class 1 functions). Keep going, take unions when you get to limit ordinals, and by the time you've hit the ordinal ω1, you're done, you've figured out how to take expectations of any measurable function.
Can we do something like this for inframeasures? No. There's no analogue of Lebesgue's Dominated Convergence Theorem. In fact, it explicitly fails, and not way off in the upper realms of the ordinals, it fails down at the start.
Proposition 1: There is an infradistribution Ψ on the space [0,1] and a sequence of continuous functions fn:[0,1]→[0,1] which limit pointwise to a function f∗, yet limn→∞EΨ[fn]=0 and EΨ[f∗]=1.
Proof: Let Ψ be the set of all probability distributions on [0,1]. The expectation function is f↦minx∈[0,1]f(x), because you can always consider the probability distribution which has all its measure on the point where the function is lowest.
Function fn is defined to dip down in a V pattern, from 1 at 0, to 0 at 2−n, back up to 1 at 2−n+1, and is 1 for all larger numbers. This sequence limits pointwise to the constant-1 function. Since all the fn have a minimum value of 0, they have expectation values of 0, and the constant-1 function has a minimum value of 1, so it has an expectation value of 1.
But we can get an analogue of Beppo Levi's Monotone Convergence Theorem, for inframeasures!
The Monotone Convergence Theorem says that if you've got a sequence of lower-bounded measurable functions fn:X→(−∞,∞] s.t. fn+1≥fn, then the pointwise limit of them, f∗:X→(−∞,∞], has
limn→∞Eμ[fn]=Eμ[f∗]
There's a bit of a roadblock though. If you start off with bounded continuous functions, then taking pointwise limits of ascending sequences like that gets you... all the lower-bounded lower-semicontinuous functions. And then, taking pointwise limits of sequences like that, you can get... lower-bounded lower-semicontinouous functions. That's as far as we can go with this result!
Also, more excitingly, we can prove another theorem that all lower-bounded lower-semicontinuous functions f∗:X→