Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

This research was produced during SERI MATS. Thanks to Vanessa Kosoy for providing ideas and reading drafts.

Introduction

In this article, we investigate "infra-Bayesian logic" (IBL for short), which is an idea introduced by Vanessa Kosoy. The original idea is outlined in Vanessa Kosoy's Shortform. Here, we expand with some details and examples. We also try to fix the mentioned issues (e.g. lack of continuity for certain maps) by investigating several candidate categories for semantics. However, we fail to find something that works.

As a theory of beliefs and reasoning, infra-Bayesianism gives rise to semantics for a certain logic, which we might call "infra-Bayesian logic" (IBL for short). Formalizing the syntax for this logic has the advantage of providing a notion of description complexity for infra-POMDPs (partially observable Markov decision processes) expressed via this syntax, and hence could serve as a foundation for an infra-Bayesian analogue of AIXI. Infra-Bayesian logic is not the only way to define an infra-Bayesian analog of AIXI. Such an alternative could be to use oracle machines (with an extra input tape for a source of randomness): For a probabilistic oracle machine, the corresponding IB hypothesis is the closed convex hull of all stochastic environments that result from using the oracle machine with a particular oracle, and then one could use the description complexity of the oracle machine (which can be defined by choosing a universal oracle machine) as a basis for an infra-Bayesian AIXI. The analogue of oracle machines for (non-infra-Bayesian) AIXI would be (probabilistic) Turing machines.

However, infra-Bayesian logic might be an interesting alternative to describing infra-Bayesian hypotheses, and this approach would be specific to infra-Bayesianism (there is, as far as we know, no good analogue of infra-Bayesian logic in (ordinary) Bayesianism).

Note that we are currently unaware of a working semantics for the higher order logic. We discuss some candidates considered in Candidates for the Semantics, and the ways in which they fail. It's unclear whether these issues with the semantics are mostly technical in nature, or whether they point to some deeper phenomena. We can, however, restrict to a first order IBL on finite sets, which, albeit less powerful, is still expressive enough for many applications (in particular covers all the examples in Examples).

See also the paragraph on IBL in the overview of the learning theoretic Agenda.

Outline and Reading Guide

You can skim/skip the Notation section and use it as a reference instead. The main description of IBL is given in Infra-Bayesian Logic. To get an initial and informal idea what IBL looks like, read Less Formal Description of Infra-Bayesian Logic. For a technical description and a complete list of properties, read Formal Description of Infra-Bayesian Logic. Note that the latter description requires some familiarity with category theory.

In Constructions in IBL we list how we can combine basic elements of IBL to express more complicated things.

In Infra-POMDPs we talk about infra-POMDPs as an application of IBL. Some knowledge of infra-Bayesianism is beneficial here, but not required. We highlight the section on Examples, where we describe the IBL terms for concrete Infra-POMDPs.

In Candidates for the Semantics we discuss the various (failed) candidates we tried for the higher-order logic version of IBL. A typical reader is not expected to read this section.

The Appendix is for mostly technical proofs and counterexamples. We recommend to only look these up when you are interested in the technical details. For example, if you wish to understand why conjunction is not continuous in IBL, you can find the details there.

Notation

Let us describe notation needed for using concepts from infra-Bayesianism.

This notation section is partially copied from the Infra-Bayesian Physicalism article, but extended from finite sets to compact Polish spaces, wich requires a bit more measure theory. Also, we use the symbol for homogeneus ultracontributions (instead of  as in the IBP article).

The notation from the IBP post is a bit different from some of the other posts on infra-Bayesianism in that it uses ultradistributions/ultracontributions rather than infradistributions.

We denote . We will work a lot with compact Polish spaces.

Compact Polish spaces are spaces that are topologically equivalent to some compact metric space (the metric is not fixed, however, only the topology). An important special case of compact Polish spaces are finite (discrete) sets.

Definition 1. Given a compact Polish space , a contribution on  is a (Borel) measure  on  with . Given a measurable function  and , we denote . The space of contributions is denoted , equipped with the weak- topology (i.e. the weakest topology such that the functions  are continuous for all continuous functions ).

Naturally, any (probability) distribution is in particular a contribution, so . There is a natural partial order on contributions:  when  for all measurable .

Definition 2. A homogeneous ultracontribution (HUC) on  is non-empty closed convex  which is downward closed w.r.t. the partial order on . The space of HUCs on  is denoted .

If a HUC  has the property  s.t. , we call it a homogeneuous ultradistribution.

Sometimes we need a topology on the space of HUCs . In this case we will use the topology induced by the Hausdorff metric, (where we use a metric on  that fits with the weak- topology. The topology induced by the Hausdorff metric is also equivalent to the so-called Vietoris topology, which is a topology that only depends on the underlying toplogy on , see here. See also Propositions 42 and 50 in Less Basic Inframeasure Theorey for the same result for infra-distributioins). One can show that  is a compact Polish space with this topology when  is a compact Polish space, see Lemma 10. Let us introduce more notation.

For a set  denotes its closure.

Given another compact Polish space  and a measurable function  is the pushforward by 

 where  for all measurable . We can omit the  in the definition if  is continuous.

A map  is sometimes referred to as an infrakernel.

Given an infrakernel  is the pushforward by 

 where (for measurable 

For a measurable function  We also define the pullback operator  via 

 We can omit  here if  is continuous.

 is the projection mapping,  is the inclusion map.

Given  and an infrakernel  is the semidirect product: 

 where the measure  is defined by 

 for all measurable , which can be extended to arbitrary measurable sets  in the usual measure-theoretic way.

We will also use the notation  for the same HUC with  and  flipped. And, for  is the semidirect product of  with the constant ultrakernel whose value is .

For a closed set , we define  (we'll call these sharp ultradistributions). We also define  via .

Infra-Bayesian Logic

The formal description in this section assumes some familiarity with type theory and categorical logic. For an informal overview of the syntax, we point the reader to Less Formal Description of Infra-Bayesian Logic.

The infra-Bayesian notion of beliefs gives rise to a fairly weak form of logic, in the sense that the spaces of predicates in infra-Bayesian logic don't form Heyting algebras, and are generally non-distributive and even non-modular as lattices. Also, due to the lack of function types, we cannot use the full syntax of type theory.

We will therefore work with a simplified syntax, still maintaining versions of many of the usual constructions appearing in higher order logic. This language can be used to describe infra-POMDPs (see Infra-POMDPs), which in turn is a useful beginning to something like an infra-AIXI (see Application: an Infra-Bayesian AIXI Analogue).

Less Formal Description of Infra-Bayesian Logic

Infra-Bayesian logic is a logical language. The language has types. Types can come from a set of initial types , or can be combined from other types using sums and products. Also,  and  are special types (where  contains no element, and  contains exactly one element, so to speak). There is also a more complicated construction of additional types: If  is a type, then we also consider predicates on  a type, which we denote by .

Infra-Bayesian logic also has maps between types. The set of maps between types  and  is denoted by . We will not list every kind of map that we require for infra-Bayesian logic in this informal description. But some examples are  and , which correspond to a meaning of "and" and "or".

A map of type  is basically the same as something of type , but it allows us to compose this with other maps. Examples are  and  which correspond to "true" and "false". There are also maps for equality and predicate evaluation. All kinds of different maps can be combined and composed to yield expressions in infra-Bayesian logic.

How does this connect to infra-Bayesianism? A collection of sentences in infra-Bayesian logic can have a model . There are lots of rules for a model to ensure that the symbols of infra-bayesian logic correspond to their intended meaning. A model maps types into topological spaces. In order to make sure the above-mentioned types behave reasonably under the model, we require that  is the empty topological space,  is the -point topological space, products and sums work as expected ( and ). A special case is the topological space for  (the type of predicates on ): We require that  is the topological space of homogeneous ultra-contributions over , i.e. . We will also explore alternatives to HUCs over topological spaces, but it should be some general notion of infra-distribution. The model of a map  is a map between the topological spaces . We will have conditions that the model of a map is consistent with the meaning. For example, we require that the model of "" maps two HUCs to their intersection.

We will later investigate the precise conditions on toplogical spaces and maps that we would like to have for infra-Bayesian logic.

Formal Description of Infra-Bayesian Logic

Syntax

Given a set  of primitive types, we can require that the types form a category , which should be the "freest" category such that:

 has finite products and coproducts

 supports infra-Bayesian logic (Definition 3).

We won't construct the category  here (uniqueness is clear). We'll use the shorthand  for the set of morphisms, and write  (where  is the terminal object).

Definition 3. We say that a category  with finite products and coproducts supports infra-Bayesian logic, if it satisfies the following requirements:

There is a functor  (recall that  is the opposite category of ). For  is the object intended to correspond to predicates on . For , we write  instead of  to denote pullback of predicates.

There are morphisms . We require that these operations turn  into a bounded lattice, and we require the functor  to induce a functor into the category of lattices. In particular, we have .

We require that  have natural left and right adjoints with respect to the poset structure (of the lattice) above. We also require these adjoints to come from morphisms in , denoted by  respectively.

We require the existence of an equality predicate , which is the minimal predicate such that  (cf. A constant function with output top).

We require the existence of  (predicate evaluation). Given , we can pull back  via  to get . Note however that we cannot require universality here (i.e. that every predicate in  arise this way), due to the failure of the disintegration theorem in infra-Bayesianism. It's not entirely clear currently what the appropriate condition to require from  should be on the syntax side, even though the intended semantics can be described (cf. Item 5 in the semantics).

For  we use  as a syntactic constant symbol corresponding to a coin toss with probability .

Note that in Item 6 we could require the following more general set of symbols, however, the relevant ones can likely be approximated via the existing syntax. For , let  (treating  as an -point discrete space) be the subset of "describable" ultracontributions under some such notion. We would then introduce constant symbols  for each .

Remark 4. In order to construct the syntax for the first order theory, we can instead consider two categories  (base types) and  (predicate types), with the functor . In this context the predicate functor  cannot be iterated, hence we are left with a first order theory. The requirements in Definition 3 remain the same, except for Item 5 being removed. The construction of a pushforward via a kernel in Pushforward via a kernel no longer makes sense in that generality, but we explain how to recover it for finite sets in the first order theory in Pushforward via kernel for finite sets, which is then used when unrolling the time evolution of an infra-POMDP in Process.

Semantics

We require a model to be a functor , which preserves finite products and coproducts.

Moreover, we require  to support infra-Bayesian logic (cf. Definition 3), and the model functor  to respect the logical structure.

In practice, we will want  to be some subcategory of . The motivating candidate is to take  to be the category of compact Polish spaces with continuous maps, and the predicate functor on  to be . This choice however doesn't work for higher order logic.

We nevertheless spell out what the logical semantics translate to, using  as a generic symbol denoting some appropriate notion of "ultracontributions".

 (cf. Lemma 10), and  is the pullback (see Continuity Counterexamples about issues with continuity in the case of HUCs).

We require  to induce a lattice homomorphism , where

 is given by convex hull of the union

 is given by intersection (see Continuity Counterexamples about issues with continuity in the case of HUCs)

Predicates. Let , and  be projection onto the second factor. The following follow from the adjunction

 is the pushforward

For , we have 

 (see Continuity Counterexamples about issues with continuity in the case of HUCs)

 (as a sharp ultradistribution)

If , then , where , and  is considered as an infrakernel.

 is the crisp ultradistribution corresponding to a coin toss with probability  (here  is the one point space).

Note in the more general setting of describable ultracontributions, we would require .

Definition 5. A subset of  (i.e. a set of sentences) is called a theory. We say that  models the theory  if  for all .

Remark 6. Finding an appropriate category  turns out to be harder than first expected. We discuss various candidates considered in Candidates for the Semantics.

In general, we note that for infra-POMDPs with finite state spaces, the transition (infra-)kernels are always continuous, and some of the issues with continuity mentioned in Candidates for the Semantics are less of a problem.

Constructions in IBL

In the following we construct various useful components of IBL using the syntax outlined above. These constructions will also be used to build examples of infra-POMDPs in Infra-POMDPs.

Pushforward

Given , we can construct the pushforward  as follows. First, consider the following two maps into :

 (this represents the graph of )

.

Composing the product of these two with , we get , and finally post-composing with , we get .

Pushforward via a kernel

Given , we can construct the pushforward  as well. This is done similarly to Pushforward, except the element in  is replaced with 

 where  is predicate evaluation.

Pushforward via kernel for finite sets

If  is a finite set, then we can think of a map  as a collection of points , which is meaningful in the first order syntax too, as long as the predicate category has a final object . In this case we can construct the pushforward along the kernel  in the first order theory as follows. For each  we have the composite 

 where  is the inclusion of , and  is the regular pushforward from Pushforward. Now, taking the product of these over , and taking (iterated) disjunction, we have 

 which is the "graph of ", using which we construct the pushforward  analogously to Pushforward and Pushforward via a kernel.

Equality of functions

Given , we want to construct a sentence . This can be done as follows. First, take the composite with the diagonal on  

 then we have 

 so  is the desired sentence.

Mixing with probability 1/2

Assume that we have two terms . Then we have . Applying a pushforward yields . Then we can compose with , which assigns a fair coin distribution on 

A constant function with output bottom

We can construct a constant function term  whose model  is a constant function that maps everything to  as follows. Let  be the terminal map. Then we can define the term  via 

 This does the right thing: By factoring through  the function  has to be constant, and by using , the output has to be .

A constant function with output top

This works the same way, just using the symbol  instead of 

Components of infrakernels for sum types

For a function , how do we get access to its components, so that we get a term ? We do this by composing with  and , i.e. 

 For the other components this would work essentially in the same way, which would give us functions .

Infra-POMDPs

Setup

We can describe infra-POMDPs in the language of infra-Bayesian logic in the following way. Infra-POMDPs consist of the following.

Finite sets of actions  and observations 

For each observation , a type  of states producing the given observation. Let  be the type of all states.

An initial state 

For each , a transition kernel .

Process

Given a state at time , and an action , we can apply the kernel pushforward , to end up in a new state . Once receiving an observation , we can do an update on the observed state. In general, this update depends on the loss function, but we can obtain a "naive" update via pull back and push forward through  to get the observed state .

Laws

In infra-Bayesianism, laws (also known as belief functions in earlier posts on infra-Bayesianism) are functions , where  is the space of possible (deterministic) policies of the IB agent (a technical note: the space  is a Polish space, when equipped with the topology generated by cylinder sets). Such a law describes infra-Bayesian beliefs how a policy will lead to certain histories.

Under some conditions, one can convert an infra-POMDP into a corresponding law. We will not explain exactly how, and instead refer to Theorem 4 in "The many faces of infra-beliefs".

Examples

Let us give an example of an infra-POMDP. The first example is basically just and ordinary MDP, rewritten to fit into the above notation for infra-POMDPs.

Example 7. We set . This example has (functionally) no relevant hidden states, i.e. states are equivalent to observations. Taking action  will lead to the same observation as the previous observation. Taking action  will lead to the other observation as the previous observation. As for the initial conditions, we start out with probability  for each observation/corresponding state.

Let us explain how Example 7 can be described using an IBL theory . We have types  and  for the states corresponding to the two possible observations. First, we introduce non-logical symbols . Then, using the construction in Mixing with probability 1/2, we can define a term in  that assigns probability  to each of , as desired. Using function equality from Equality of functions, we can equate the initial state  with this mixture. We add the equality sentence to our theory . This ensures that every model of the theory  satisfies .

Next, we will add sentences that describe the transition kernels. Note that we have two transition kernels , one for each possible action. Then, for each transition kernel, we can decomposition it, like in Components of infrakernels for sum types. This gives us  possibilities. For example,  describes our beliefs that we will be in state  after we take action  and are in state . For all the  possible combinations, the resulting function should be a constant function, with values  or . Again, we can use the equality of functions as described in Equality of functions to ensure that these components of the transition kernels have the correct values, in accordance with the transition kernel as described in the example. We will add these sentences to the theory , so that  now has 9 elements.

Example 8. The following is a "fully infra" POMDP.

  • , single observation
  • , two states
  • , single action.

The credal set corresponding to

  •  is the interval 
  •  is the interval .
Credal sets K(A) and K(B)

We can take downward closure to end up with a homogeneous ultradistribution.

So a single step in the Markov process swaps  and , but with some Knightean uncertainty of landing up to  away from the target. As we iterate through multiple timesteps, we gain no information through observation (since ), so the intervals of Knightean uncertainty widen (the portion outside the Knightean interval shrinks as ).

We can construct the above infra-POMDP via IBL as follows. Let 

 be coin tosses with probability of heads equal to  and  respectively. We could in principle approximate these with iterated fair coin tosses, but we assume them as given for convenience. Let 

 be the two states. Then we have 

 and similarly for . We can construct the intervals of length  from  and 

 

 Finally, we construct the transition kernel .

Application: an Infra-Bayesian AIXI Analogue

A potential application for infra-Bayesian logic is to describe the hypotheses of an AIXI-like infra-Bayesian agent, by using IBL to describe infra-POMDPs. Let us expand a bit on this idea.

Let's assume an infra-Bayesian agent knows that the environment is an infra-POMDP and also happens to know the finitely many actions  the agent can take each time step, and the finitely many possible observations  the agent could be receiving. However, it does not not know how many states there are. (We will leave rewards out of the picture here. We could include rewards by encoding it in the observations, e.g., by adding a reward  for each observation. But for now let's focus only on the agent's model of the environment.)

An infra-Bayesian agent would have a set of hypotheses of how the environment could be. In our setting, each hypothesis has the form of an infra-POMDP. Like an AIXI agent, we would like to have a variant of Occam's razor: simpler hypotheses should receive a higher weight.

There is a potential approach to use a finite theory in IBL to describe a hypothesis for our infra-POMDP, and then the weight of this hypothesis in the prior is based on the length of the description of the IBL theory, e.g. proportional to  when  is the description length. Note that some IBL theories might have no valid model, so these theories should be discarded from the hypotheses; and there might be other reason why an IBL theory might be excluded from the space of hypotheses, see below.

Each theory in infra-Bayesian logic can have (possibly none or multiple) models. If we use types  and symbols  as in Setup, then each model  describes an infra-POMDP: The initial state can be described by , and for each  is the transition kernel.

The question arises what do we do if there are multiple models for the same IBL term. Let us describe two approaches, which both rely on converting the infra-POMDP into a corresponding law, see Laws.

A first approach could be to only allow theories in IBL that produce models which are unique up to equivalence, where we say that two models are equivalent if they produce infra-POMDPs that produce the same laws.

A second approach could be to consider all possible models in a first step, then convert these models into corresponding laws, and then take the disjunction (i.e. closed convex hull of all outputs of the laws) to turn it into the final law.

Candidates for the Semantics

Compact Polish Spaces with Continuous Maps

A first candidate for the category  would be compact Polish spaces with continuous maps, and '' being the space of HUCs as defined in Notation.

However, we run into issues with continuity in this category. Namely, the maps  are not always continuous, see Continuity Counterexamples.

Allowing Discontinuity

Another approach could be to bite the bullet on discontinuity and broaden the set of allowed morphisms in the semantics category. The problem with these approaches generally is that pullbacks really only behave nicely with respect to continuous maps, we typically lose functoriality when allowing discontinuity. That is, the desired property  can fail to hold for maps . This is mostly because for discontinuous functions we have to take the closure in the definition of the pullback (see Notation).

Upper Semicontinuity

This approach is inspired by the fact that while intersection is not continuous on , it is upper semicontinuous (wrt to the set inclusion order) on .

For this approach, we would use the category of compact polish spaces with a closed partial order, and for the maps we require that they are monotone increasing and upper semicontinuous. Here, a function  is upper semicontinuous when 

 holds for all sequences  and points . As for the partial order on , one could consider the partial order based on set inclusions, or additionally combine this with the stochastic order mentioned in Definition 2.1 in IBP by requiring that elements in  are also downwards closed with respect to the stochastic order.

In both cases, the approach with upper semicontinuous maps, runs into the functoriality issues mentioned above, see Semi-continuous Pullback is not Compositional.

Measurable Maps

We could try to broaden the set of morphisms even further, but naturally the issues with functoriality remain.

A Different Notion of HUCs

A possible remedy for the continuity issues could be to use an alternative definition of HUCs, in which the problematic maps become continuous. Roughly speaking, the high level idea here is to require elements of the modified HUCs to "widen" closer to , thus avoiding the discontinuities.

Unfortunately, we were not able to find a working semantics of this type, despite trying different approaches, as described below.

We can require HUCs to be closed under the finer partial order  in Definition 9, and modify Definition 2 accordingly. Let  denote the space of these alternative HUCs. One downside of this approach is that now the space  depends on the metric on , while previously it only depended on the underlying topology.

Definition 9 (downward closure related to 1-Lipschitz functions). Let  be a compact metric space. For  we define the relation 

 The space  is defined as the set of closed convex subsets of  that are downward closed under the partial order .

To see that it achieves some of the desired behavior, we show that  is continuous in Lemma 14. This approach also requires that we work with compact metric spaces, instead of merely with compact polish spaces.

We give some technical details in Standard setting for the alternative HUC case. Later we will look at alternatives, mostly to justify the choices made in Standard setting for the alternative HUC case.

Standard setting for the alternative HUC case

The Category of metric spaces is typically equipped with maps that are -Lipschitz continuous, and the metric on a product  of metric spaces  is given by 

 We do make some further modifications. First, we only consider compact metric spaces whose metric  only has values in . This allows us to consider sums/coproducts of metric spaces (usually the category of metric spaces does not have coproducts). For the metric on a sum (disjoint union) of two metric spaces, we set the distance of two points from different components to .

Let us define the metric we will use on . As a metric on  we will use the Kantorovich-Rubenstein metric 

 which is a metric that is compatible with the weak- topology on . Then we use the Hausdorff metric  on . It can be seen that this Hausdorff metric has values in  again.

A problem with  is that it is not always preserved under the pullback, i.e.  can happen for some . Thus, we will redefine the pullback in this context. For a function between metric spaces  we define the modified pullback  via . We will use this alternative pullback for the case of .

Under this setting, we run into the issue that  is not -Lipschitz continuous, see Example 16. This is despite the fact that  is continuous (and probably even Lipschitz continuous).

Allow higher Lipschitz constants for maps

One might wonder what happens, when we pick a higher Lipschitz constant  for maps in our category (instead of ). If we have two functions with a Lipschitz constant , then we can only guarantee a Lipschitz constant of  for the composition (e.g. with the function ). If we repeat this argument, we cannot guarantee a finite Lipschitz constant, because  as .

So, what happens if we have allow maps with Lipschitz constant ? It turns out that then we run into issues with functoriality, as in Allowing Discontinuity. A concrete counterexample is described in Example 15.

Another metric on products

Instead of the choice for  in Standard setting for the alternative HUC case, we could consider using the metric 

 A first problem with this choice is that this is incompatible with any finite bound on , as is the case in Standard setting for the alternative HUC case. This issue can be avoided by considering extended metric spaces, which are metric spaces whose metric can take values in  (in that case the distance between disjoint components in a sum is set to ).

A more difficult problem with this approach would be that the diagonal map  is not -Lipschitz continuous for other natural choices of metric on products. We do need the diagonal map, see, e.g., Equality of functions.

Modifying the definition of alternative HUCs

We mention that as an extension, the definition for  in Definition 9 could be modified to include a constant  and use . For smaller , this would make these alternative HUCs sharper. The version with  has some drawbacks as a lot of probability mass gets assigned to points nearby: For , we have , so an alternative HUC which contains  also has to contain . In the context of infra-Bayesianism, this means we can use these kind of HUCs only in combination together with a smaller class of loss functions (in this case, loss functions that are of type  for a nonnegative Lipschitz continuous ).

One might wonder whether increasing  might help us decrease the Lipschitz constant for  or even choose a  such that  is -Lipschitz. While the Lipschitz constant does get improved, calculating through Example 16 would show that the Lipschitz constant  cannot reach  even if we use these modified versions of .

Appendix

Lemma 10. If  is a compact Polish space then so is .

Proof. It is known that the space of nonempty compact subsets of  is compact in the Hausdorff metric. One can also show that  is a closed subset of the space of nonempty compact subsets of .

For the related concepts (but not HUCs) in IB theory, this result can also be found in Proposition 47 in "Less Basic Inframeasure Theory" ◻

Issues with Continuity

Using the topology induced by the Hausdorff metric on  (see below Definition 2), we find that in the 'logical' semantics (inverse) limit type constructions (in the category theoretical sense) are generally not continuous (see Continuity Counterexamples for details). This includes the semantics for , and the pullback .

If  itself is finite, then  is nice enough that we don't run into these issues. This observation is sufficient for example in the construction of the infra-POMDPs in Examples when the state spaces are finite. However, we quickly run into issues if we're not careful with iterating the  construction, since  is not finite even if  is.

Below is some discussion on a possible approach to trying to fix these issues (we haven't found an entirely satisfactory fix though).

Continuity Counterexamples

Lemma 11. The function , given by  is not always continuous.

Proof. Consider the case where  is the space  with standard topology. This is a Polish space.

We consider sequences with  and . We define . Then  and  are in . (For a point , I understand this notation to be , where  is the measure that puts all weight on the single point ).

One can calculate that .

Because of the Hausdorff metric in  we have  and .

To put it together, we have 

 Thus,  is not continuous. ◻

Lemma 12. The function  is not always continuous.

Proof. Consider the case that . We use the notation . We choose the sequence , where  denotes the line between points. Then one can show that .

However, we also have . One can show that  holds. Thus, we have 

 i.e.  is not continuous. ◻

Lemma 13. There is a continuous function  such that  is not continuous.

Proof. Pick . Then , but for the limit we have , and not . ◻

Semi-continuous Pullback is not Compositional

Let  be given by 

 And 

 Both  and  are monotone increasing, upper semi-continuous. Let . Then , the closure . Hence , while , so 

We can construct a corresponding counterexample where  by considering , defined as follows. Let  be the sharp HUC on . Then 

 so 

 Then 

 but 

 which is different.

Stricter Homogeneity

We collect some technical results for A Different Notion of HUCs

With Definition 9 we can prove continuity for , which does not hold for ordinary HUCs.

Lemma 14. The function , given by  is continuous.

Proof. We set . As a metric on  we will use the Kantorovich-Rubenstein metric 

Let  be sequences in  with . We want to show that . We have 

 For the first term, let  be such that  is maximized. By compactness, there exists a convergent subsequence of . Wlog let . We have  for all , and taking limits implies . The same argument also implies . Thus, we have 

 For the second term, let  be given. Due to , there exists a sequence  with  and . Similarly, there exists a sequence  with  and . The idea is now to construct a measure (i.e. a point in  that is close to  and  but in .

Wlog we assume  and  for all . We define  via 

 where  is the diameter of . Clearly, we have . We also define 

 We want to show  and . By symmetry we only need to show . For that purpose, let  be an arbitrary -Lipschitz continuous function. We define . We have to show .

We can decompose , where  is a constant, and  is a nonnegative function that satisfies  for some . Since  is -Lipschitz continuous, this implies that  for all . Thus,  is a function that is -Lipschitz continuous and bounded by , i.e. admissible in the definition Eqn. (1). We can use this to obtain 

 On the other hand, we have 

 Adding these yields 

 If we rearrange this inequality, we get . Dividing by  yields , Since  and  was an arbitrary -Lipschitz function this proves . Since  and  is downwards closed wrt , this implies . Similarly, we can obtain  and . From the definition of  we also obtain  (due to  and ). Thus we have  as . Since  was arbitrary, this implies that the right-hand side in Eqn. (2) converges to . This implies  in the Hausdorff metric, which completes the proof. ◻

In the next we will use the notation  for the modified pullback, see Standard setting for the alternative HUC case.

Example 15. Consider  as a compact metric space with the usual metric. Further, let  be given by  for all . Then , i.e. functoriality does not hold.

Proof. Clearly, we have  for all .

We define 

 We claim that  holds. Due to  for all  this simplifies to 

We start by observing  and .

We also claim . Indeed, for -Lipschitz continuous  we have 

Due to  this implies . Due to  we have . This implies .

However, we can also show that . Indeed, lets consider the -Lipschitz continuous function . Then  Thus . This completes the proof of Eqn. (3) and hence completes the proof of the lemma. ◻

Example 16. Let  be the compact metric space where the metric satisfies . We use the standard setting from Standard setting for the alternative HUC case otherwise. Then the function , given by  is not -Lipschitz continuous.

Proof. Consider the measures .

Let  be the smallest set in  that contains , for . Then it can be calculated that 

 We want to calculate . Due to , we have . Since the supremum of a convex function is attained on extremal points, we only need to consider  for the case that  is an extremal point of . The extremal points of  are . Let us calculate the distances (or bounds on distances) of these points to . We start with . We observe . This implies 

 On the other hand, we have  and  for all . Since  is admissible in Eqn. (1), this implies  for all , and therefore . Thus, .

Next, we calculate a bound on . As we have , we obtain 

 Similarly, for a bound on , we observe  and calculate 

 Finally,  holds due to . In summary, we have shown for all extremal points of  that the distances to  do not exceed  (and obtain the value  in at least one case). This implies .

By symmetry we also have .

Now, let us consider the intersection . Using the above descriptions for  and  we can obtain the subset relation 

 This implies  for all . Recall that we have  and . By using  in Eqn. (1), it follows that  for all . Thus, 

 Combining all these results shows 

 which shows that the Lipschitz constant has to be at least . ◻

Measurability and Continuity Proofs

Let us collect proofs that some maps between compact Polish spaces that are mentioned in Semantics are indeed measurable, as required. This concerns .

Lemma 17. The function  is continuous and measurable.

Proof. Let  and . Recall that . Let  be sequences with  and .

We need to show  for continuity. We do this by showing two things: First, every limmit  of a convergent sequence  with  satisfies . Second, for every point  there exists a sequence  with .

So, first, let  be a sequence with  and . We have , and thus there exists  with . Wlog we have convergences . This implies .

For the other part, let  be given. Thus, there exist  with . Due to  and  there exist sequences with  with , and . Then, we have . Taking limits implies . This completes the proof of continuity of . ◻

Lemma 18. If  is a continuous function between compact Polish spaces, then  is continuous.

In particular, the function  is continuous and measurable.

Proof. Let  be a sequence in  with . We want to show . As before, we do this by showing two things. First, every convergent sequence  with  implies . Second, for every point  we can find a sequence  with  and .

Note that  is continuous if  is continuous.

Let  be a convergent sequence with . Then there exists  with . Wlog  for some  due to compactness. This implies  and . Due to uniqueness of the limit, we have  and therefore the desired .

For the remaining part, let  be given. Then there exists  with . Since , there exists a sequence  with  and . We define . Then we have  and . This completes the proof. ◻

Lemma 19. The function , given by  is measurable.

Proof. We use the abbreviation , and use the notation  for the metric on  and  for the induced Hausdorff metric on . To show measurability, it suffices to show that the preimages of open balls are measurable. Thus, we need to show that  is measurable for all  and .

First, we show an auxiliary claim.

Claim 1: Let  be a compact subset. Then  is closed and measurable.

Let us show that the claim is true. Let  be a sequence in the set with . Then there exists  with  and . Wlog  for some  (because of convergent subsequences). Then  and thus . Similar, . Thus, the limit  of  is in the set . Thus, the set is closed and measurable.

We have .

Let us show measurability of . We have 

 For each , the set  is closed due to claim 1, because  is a compact set. Recall that intersections of closed sets are closed. Thus, the set in question is the countable union of closed sets, and therefore measurable.

Now, let us show the measurability of . We have 

 Again, since the set  is compact, it follows that the above set is measurable by claim 1.

This completes the proof that  is measurable. ◻

Lemma 20. Let  be compact metric spaces,  and  be given. Then there exists  with  and , where  is the Wasserstein metric.

Proof. We set . In the following, we will also use the notation  for .

Then, due to the definition of , there exists a measure  with  (due to compactness arguments, the infimum is attained in the definition of ). We choose a measure  via 

 where  denotes a density function that satisfies  for all measurable , and  is the same for the "mirrored" measure  of , i.e.  for all measurable . These density functions exist due to Radon-Nikodym. The measure  exists due to Caratheodory's theorem and is unique.

The measure  has the properties 

 where  is map that projects on components  and , and  is the diagonal map.

We now choose . Indeed, this satisfies the desired property . As for the Wasserstein metric,  is an upper bound on . Let us calculate . We have 

 This implies , which completes the proof. ◻

Lemma 21. The function  is measurable.

Proof. We abbreviate . Again, we use the notation  for a metric on  or  and  for the induced Hausdorff metric.

Again, we start with an auxiliary claim.

Claim 2: Let  be a compact subset. Then  is closed and measurable.

Let us show that the claim is true. Note, that the set is the same as . Let  be a sequence in the set with . That means, there is a sequence  with  and . Since  is compact, we can wlog assume that  (by selecting a convergent subsequence). Let  be given with . We need to show that  holds (this would show that  is in the set mentioned in the claim, i.e. it would prove the claim). Wlog  (otherwise  follows from definition). This implies , which allows us to wlog assume that  for all . Let us define the probability measures .

We can apply Lemma 20 to these probability measures, which implies that there exists  with  and .

It is known that the Wasserstein metric  is compatible with the weak- topology on  and . Thus, we obtain  from . This, in turn, implies .

Next, we define . This satisfies  and . Due to the properties of  and , this implies . Now the desired property  follows by taking limits. This completes the proof of the claim.

To show measurability of , it suffices to show that the preimages of open balls are measurable. Thus, we need to show that  is measurable for all  and . We proceed similarly to Lemma 19. We have 

 Let us start by showing measurability of the second set, i.e. . Since  is compact, there exists a countable dense set . We have 

 Applying claim 2 for the compact set . Thus, we have a countable union of an intersection of closed sets. This is measurable.

Now, let us show the measurability of the other set . We have 

 Again, since the set  is compact, claim 2 implies that the above set is measurable. This completes the proof that  is measurable. ◻

As for the measurability of , we were not able to prove it for measurable . There might be some relation to the problem that the image of a measurable set under a measurable map is not measurable.

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 5:33 AM
[-]Quinn10mo10

 is the inclusion map.

what makes a coproduct an "inclusion mapping"? I haven't seen this convention/synonym anywhere before. 

[-]harfe10mo10

"inclusion map" refers to the map , not the coproduct . The map is a coprojection (these are sometimes called "inclusions", see https://ncatlab.org/nlab/show/coproduct).

A simple example in sets: We have two sets , , and their disjoint union . Then the inclusion map is the map that maps (as an element of ) to (as an element of ).