Basic building blocks of dependent type theory

This post is the first in a sequence of posts meant to be an introduction to pragmatic type theory. “Type theory” here refers to Martin-Löf dependent type theory (MLTT), as opposed to theories like the simply-typed lambda calculus; the main difference being that in MLTT, functions can also act on types and not just values, which makes it a more powerful theory.

I was initially motivated to study type theory because of a remark in the Alignment Research Field Guide about how defining terms and adding type annotations to them can make the topic of alignment research discussions more precise, as an intermediate step between vague idea and fully formalized proof.

You might also be aware that type theory can serve as an alternative foundation for mathematics, in place of set theory. And some people think type theory is the more elegant choice of the two; a point which this article is for example arguing for. Another reason to learn type theory is that it serves as the basis of most theorem provers.

I’m not aware of any short-ish but still pretty comprehensive introductions to MLTT, so that's why I wrote this sequence of posts. The content is mostly based on the book Homotopy Type Theory, with the main difference that the book tries to convince you to do constructive mathematics, whereas I won’t. (I also won’t talk about any homotopies at all.) This means though that if you’re interested in constructive mathematics (or homotopies), you should probably just read the book instead.

I will sometimes compare type concepts to concepts found in programming languages, but if you don’t have any programming experience, you can just ignore that. What you should have though, is a good understanding of set theory, because I will reference it frequently.

With that said, let’s get into it!

Type judgments

A simple type annotation looks like this:

where $T$ is a type and $x$ is an element of this type. This looks very similar to the statement “ $x \in T$ ” where $T$ is a set, and it is quite similar, but there are some important differences.

For one, types are usually stricter than sets when it comes to membership. In set theory, an object can easily be a member of multiple sets. In type theory, an object usually only belongs to a single type (though maybe that's not quite the case when union types are involved).

The other difference is that “ $x \in T$ ” can be used as a proposition within a logic formula like this:

\forall x (x \notin A \lor x \in B)

but you cannot do that in type theory. For example, something like “ $(x : T) \Rightarrow (x < y)$ ” doesn’t make any sense. Type judgements (i.e., the “ $: T$ ” part) can only be applied from the outside to a full statement. For example, you could have some formula $Φ$ , describing the result of some computation, and then you can claim “ $Φ : T$ ” (formula $Φ$ is of type $T$ ) and that is then either true (more precisely, “well-typed”) or false (“not well-typed”) as seen from the outside. But type judgments cannot be used inside a statement as a true-or-false-valued term.

It’s like if you had a programming language that lacked any kind of type introspection, such that at runtime you couldn’t write an if-statement that branches based on the type of some object. But external programs like the compiler and static analysis tools identify the type, and tell us if we wrote our program correctly; for example, whether the values we are passing to functions have the appropriate type.

There is a caveat to the rule that something like “ $x : A$ ” may not appear within formulas and this concerns universal and existential quantification. Later, we will define these two quantifiers to write formulas like

\forall (x : A) . \exists (y : A) . (y > x)

but even here, “ $x : A$ ” is not something that can be true or false; it’s just an indicator for the “domain” of the quantifier.

So, membership in a type cannot be used as a proposition in type theory, but then how do we express things like “ $(x : S) \Rightarrow (x < 0)$ ” where $S$ is, for example, some “subset” on the integers? The solution we’ll use is to define “subsets” as predicates (or characteristic functions) on some base type. So, you could, for example, define a set in $N$ via a predicate function, i.e., a function that takes in an $n : N$ and returns true or false. And then you could ask questions about membership in this “set” by just applying the predicate function. We will later see how this works in detail.

Product types

Speaking of functions, how do we type-annotate those? Before we get there, let’s talk about something simpler first: pairs of values. If we have two values: $a : A$ and $b : B$ , then we can form a pair that has the type $A \times B$ :

(a, b) : A \times B .

In contrast to (the usual definition of) set theory, type theory has ordered pairs as a native building block. The definition of ordered pairs goes along with a definition of projection functions (going from $(a, b)$ to, e.g., just $a$ ) but we’ll get there later.

For $n$ -tuples, we can pair repeatedly:

(a, (b, (c, (d, e)))) : A \times (B \times (C \times (D \times E))) .

However, we usually leave out the extra parentheses:

(a, b, c, d, e) : A \times B \times C \times D \times E

while always implying that parentheses open after any “ $T \times$ ” and close at the very end of the expression.

Now, back to functions. If we have a function $f (x)$ , one thing we can annotate is its return type. We could maybe do it like this:

f (x) : Y .

However, we would also like to know the type of the argument $x$ , so that we can know what values we can pass to this function. Say that our function $f$ maps values from $X$ to $Y$ . Type theorists have developed a brilliant notation for this:

f : \prod x : X Y

Now, you might be saying: what the heck is this? And I’d agree with you, but there is some justification for writing a function type like this. You see, a function could be seen as a very long ordered tuple, where each entry of the tuple is associated with a particular input value. The entries of the tuple all have type $Y$ , so the type of the tuple is kind of

Y \times Y \times Y \times Y \times Y \times \dots

with one entry for every value $x : X$ . Hence, we write it as a product: $\prod_{x : X} Y$ . In set theory, the set of all functions from $X$ to $Y$ is written $Y^{X}$ for a similar reason. (Why are we not writing it like that in type theory? We’ll see later how the $x : X$ is useful.)

In contrast to set theory, functions are a fundamental concept in type theory, and are not defined in terms of other things. There are three basic building blocks in type theory: values, functions, and types. (And then you can additionally construct ordered pairs out of any combination of these.) Values could also be seen as constant functions that don’t accept any arguments, so, in a sense there are only functions and types, but we will talk about “values” as separate things nevertheless.

The basic theory of functions that type theory is built on is lambda calculus. I will just quickly summarize it here. Feel free to skip this next section if you are already familiar.

Aside on Lambda Calculus

Say you have a function like:

f (x) :\equiv x \cdot x .

If you want to apply this function now to the number 4, then the rules of lambda calculus tell you to replace all $x$ ’s with “4”:

f (4) \equiv 4 \cdot 4

which probably doesn’t come as a surprise.

Now, in this example, $f$ was a named function (its name is “f”), but often it’s inconvenient to give every function a name, so we need a notation for anonymous functions. This is most useful when some function takes another function as an argument.

For example, the Fourier transform transforms functions and if we let the transform be denoted by $F$ then applying it to a function $f$ looks like this: $F (f)$ . However, we might not always want to give the function passed to $F$ a name. In this case we can pass it an anonymous function. My preferred notation for anonymous functions is something like

x \mapsto x \cdot x

where $x$ is mapped to $x \cdot x$ . So that you can write $F (x \mapsto x \cdot x)$ for its Fourier transform. But the notation used in lambda calculus is this:

λ x . x \cdot x

(and so we’ll be using this notation). The lambda ( $λ$ ) is not meant to be a variable here, but a marker that marks the beginning of a function. The variable that comes after the $λ$ is the input argument and what comes after the dot ( $.$ ) is the function body. Applying the Fourier transform to it looks like this: $F (λ x . x \cdot x)$ .

Anonymous functions (and named function too, actually) may only take a single argument, but as we’ll later see, we can get around this restriction by passing in $n$ -tuples as arguments or by using nested functions. I know this notation takes some getting used to, but it won’t feel strange after a while. As before, we’ll have the convention that the function body extends until the end of the expression (if there are no parentheses).

The following two are completely equivalent then:

\begin{matrix} f (x) :\equiv 3 + x f :\equiv (λ x . 3 + x) \end{matrix}

(However, whenever possible, we will prefer the first notation.) Applying anonymous functions works exactly like named functions; all occurrences of the input argument are replaced and we can drop the part before the dot:

(λ x . x \cdot x) (4) \equiv 4 \cdot 4 .

The variable following the $λ$ is a temporary variable and the concrete letter used doesn’t matter. So the following two functions are completely identical:

(λ x .2 \cdot x) \equiv (λ y .2 \cdot y) .

Sometimes you have to be a bit careful in function application when there is a risk of a naming clash. This can happen when you have nested functions.^[1]

Function types

The identity function on the integers $Z$ has the following type:

(λ x . x) : \prod x : Z Z

Notice that functions are instances of some function type; a single function type can be associated with many different concrete functions.

Sometimes we also explicitly annotate the type of the input argument:

(λ (x : Z) . x) : \prod x : Z Z

However, we usually only do this if the type is not fixed, as in the following example.

As it is, our identity function only works for one type: $Z$ . But it would be nice if it worked for any type. We can achieve this by constructing a function that takes in a type as an additional argument, which in programming languages is known as a generic function. To do this, we need to talk about the type of types, so that we can specify the type of that additional argument.

The type of a type is called a universe $U$ . However, there can’t be a single universe that contains all types because then it would have to contain itself, which leads to problems like Russell’s paradox. So, instead, there will be an infinite hierarchy of type universes where each universe is contained in the next universe on the hierarchy: $U_{i} : U_{i + 1}$ . Additionally, we’ll say this hierarchy is cumulative, so if $T : U_{i}$ then also $T : U_{i + 1}$ . These detail doesn’t really matter to us though. What’s important to know is that for any type $T$ , there is some universe $U_{i}$ in the hierarchy of universes that contains this type: $T : U_{i}$ . As the exact universe is usually not important, we’ll usually drop the index $i$ .

With this knowledge, we can now define the generic identity function:

i d :\equiv (λ T . (λ (x : T) . x)) : \prod T : U (\prod x : T T)

Let’s take a moment to understand this. We have nested functions: the outer function takes a type parameter $T$ and then returns the inner function, which is just the identity function, but on type $T$ (which we annotated explicitly). To get the identity function on the integers, you would do: $i d (Z)$ , and then $i d (Z) (3) \equiv 3$ . For notational convenience, people often write the first type argument as a subscript:

{i d}_{Z} (3) \equiv 3

In general, functions with more than one argument can be realized in two ways: either by modeling them as nested functions that each only take one argument (also called currying), or by modeling them as a single function that takes an ordered pair or $n$ -tuple as argument. However, we could not have realized the generic identity function using the latter method, because the type of the second argument depended on the value of the first argument. Currying is strictly more flexible than using tuples. Additionally, you always need projection functions when working with tuples. Speaking of which, here they are for an ordered pair:

\begin{matrix} {p r}_{1} :\equiv λ (a, b) . a : \prod p : A \times B A {p r}_{2} :\equiv λ (a, b) . b : \prod p : A \times B B \end{matrix}

The way we have defined these functions here is called function definition by pattern matching, because we have written the input argument in an unpacked form. The non-pattern-matching notation would have been $λ p . (\dots)$ , where $p$ is an ordered pair. However, with that notation we cannot express what this function does because… we haven’t defined any way to unpack a tuple yet; that’s what we’re trying to do right now! So, pattern matching is a very convenient way to describe new functionality that our theory didn’t have yet, even though it’s not technically legal in lambda calculus. (Don’t worry though, pattern matching can be defined rigorously; we just won’t do that here.) We will use pattern matching for later function definitions as well.

Just as a further example, here is the type of the add function on the reals, using currying:

a d d : \prod x : R \prod y : R R

We left out the parentheses in the type, as is customary. And here it is using pairs:

a d d : \prod p : R \times R R

Multiplication has the same type, of course.

Sum types

In the beginning I said that type theory has no type introspection that most programming languages have. But actually, through some cleverness, we can get something very similar. The setup is that we have some value $x$ , but we want to allow for it to be either of type $A$ or of type $B$ . The problem is that nothing within our theory can query the type. But we can solve this through tagged unions. Instead of a bare value $x$ , we’ll have a pair $(t, x)$ where the first element is a tag and the type of the second element depends on the value of the tag. This way, we can find out the type of $x$ simply by checking the value of the tag! (Assuming of course that we tagged everything correctly.)

You might think we could just use the pair type for this – like $A \times B$ – but that’s not possible because we need the type of the second entry to depend on the value of the first entry. This can’t be done with what we already have, so we need to introduce a new type formation rule. And that is the sum type, also called dependent pair type (because it’s a pair where the type of the second entry is dependent on the first entry):

(t, x) : \sum t : A B (t)

This means that $t$ is of type $A$ , and the type of $x$ is $B (t)$ , which is a function with takes in $t$ and returns a type (so, it’s a type-valued function). Now, why is this written as a sum? It makes sense in a certain way: As we learned, the type of a pair is $X \times Y$ , which is kind of like a multiplication. But a multiplication is just repeated addition, so we could say

X \times Y \equiv \sum x : X Y .

Thus, if we have an ordered pair where the second element depends on the value of the first, then the sum $\sum_{x : X} Y (x)$ is a somewhat reasonable way to write this. The sum can also be seen as a disjoint union over the family of “sets” $B (t)$ that is indexed by $t : A$ where the tag ensures that the “sets” are definitely disjoint. The resulting type can have elements from all the individual types $B (t)$ .

In order to do anything with an instance of this tagged union, we need to handle all possible types that the second element of the pair $(t, x)$ can have. This means we can’t easily write down a projection function for the second element, because we don’t know what the return type will be.

So, instead of defining a projection function, we’ll define how some function $g$ could consume a tagged union. $g$ will need to accept two arguments: the tag $t$ and the actual value $x$ . The type has to be something like this:

g : \prod t : A \prod x : B (t) C

where $C$ is simply an arbitrary return type of $g$ (which could be another product or sum type). We can now define the function $r e c$ which every sum type has its own version of; it takes in a return type $C$ , a function $g$ and an element of the sum type:

\begin{matrix} {r e c}_{Σ_{t : A} B (t)} : \prod C : U \prod g : Π_{t : A} Π_{x : B (t)} C \prod p : Σ_{t : A} B (t) C {r e c}_{Σ_{t : A} B (t)} (C, g, (t, x)) :\equiv g (t) (x) \end{matrix}

Note that we had to make $r e c (\cdot)$ generic in $C$ in order to make sure that the overall return type matches the return type of $g$ . Note also that this is another instance of function definition by pattern matching; we’re unpacking the $(t, x)$ pair in the input argument definition, which isn’t normally a legal operation.

If you only have a handful of types that you want to take the union over, the this whole $\sum_{x : X} Y (x)$ setup can be a bit overkill. For that case, there is a simpler notation that doesn’t mention the tags explicitly. If $A$ and $B$ are types, then the following is also a type:

A + B

It’s called a binary sum type and the notation follows straightforwardly from the idea that we were taking a sum over types.

Even though you can’t see any explicit tags in this definition, they are still kind of there, because for example if $a : A$ then we do not have $a : A + B$ . The elements of $A + B$ are different from the elements of $A$ and $B$ . In order to turn an $a : A$ into an element of $A + B$ , we have to use a special function, one of which is associated with every binary sum type, and there is of course also one for $B$ . These “tagging functions” will get the names $l e f t$ and $r i g h t$ :

\begin{matrix} l e f t & : \prod a : A (A + B) r i g h t & : \prod b : B (A + B) l e f t (a) : A + B & , r i g h t (b) : A + B \end{matrix}

You can see how they can turn an element of $A$ into an element of $A + B$ . In order to use one of the values of $A + B$ , you again need a way to handle both possibilities. So, say you have functions $g_{0}$ and $g_{1}$ which take an argument of type $A$ and $B$ respectively, but both return something of type $C$ :

\begin{matrix} g_{0} : \prod a : A C g_{1} : \prod b : B C \end{matrix}

Then we can use a function called $r e c$ to map an element of $A + B$ to the right function:

\begin{matrix} {r e c}_{A + B} : \prod C : U \prod g_{0} : Π_{a : A} C \prod g_{1} : Π_{b : B} C \prod p : A + B C \end{matrix}

It’s a bit tricky to give the implementation of ${r e c}_{A + B}$ , because we don’t have explicit tags – we only have the tagging functions. This means, we can only describe from the outside how ${r e c}_{A + B}$ will work; if the element of $A + B$ was tagged with $l e f t$ , then we execute $g_{0}$ , and if it was tagged with $r i g h t$ , then we execute $g_{1}$ :

\begin{matrix} {r e c}_{A + B} (C, g_{0}, g_{1}, l e f t (a)) :\equiv g_{0} (a) {r e c}_{A + B} (C, g_{0}, g_{1}, r i g h t (b)) :\equiv g_{1} (b) \end{matrix}

The knowledge of whether an element of $A + B$ comes from $A$ or $B$ is never encoded anywhere within the system, but from the outside, we can choose the correct function definition to execute. We have to assert that this function exists, because it cannot be defined with the other building blocks that we have. This is again an instance of a function definition by pattern matching; the suitable definition can be found by pattern matching any value against $l e f t (a)$ and $r i g h t (b)$ .

It’s important to state here that in practice, you would likely not use the ${r e c}_{A + B}$ function explicitly. Rather, you can just allude to the fact that it’s possible to do two different things depending on which original type an element of a union had. Like, you would say “let $x : A + B$ , if $x$ came from $A$ , do this; otherwise do that.”

Examples

An example of a function that can be modeled as returning a union is division on the rational numbers, including 0. As we know, division by 0 is not defined, but you could return a special symbol when dividing by 0. So, let’s say $1$ is the type that only contains one element, for which we could use the symbol $⋆ : 1$ , and we return that symbol when dividing by 0. Then the function would have this type:

d i v : \prod x : Q \prod y : Q Q + 1

When dealing with the output of this function in your derivation, you could (sloppily) say: “If the output is some $z : Q$ , then we do X next; otherwise we handle it with Y.”

An example of a use case for non-binary sum types is the definition of mathematical “data structures” like a ring or a group. These data structures are usually defined as a tuple, where the first element is a set, and the other elements are functions or other structures based on that set. In type theory, the set would be a type, obviously, but when we try to just define the data structure as a tuple, we will encounter the problem that the type of later tuple entries depends on the value of the first tuple entry.

To make this concrete, let’s consider groups. A group is a set with a binary operation, $\cdot$ , and a neutral element, $e$ . So, groups should have this type:

(T, \cdot, e) : U \times (\prod x : T \prod y : T T) \times T

A tuple of a type, a binary operation and an element of the type. However, how do we ensure that $T$ is actually the type given in the first entry of the tuple? Clearly, we need a sum type here; after all, a sum type is just a pair where the type of the second element depends on the value of the first element:

G r o u p :\equiv \sum T : U (\prod x : T \prod y : T T) \times T

As usual, we left out the parentheses around the last two terms. This type of all groups is a union over all pairs of a binary operation and a neutral element, tagged with the type that they’re operating in. With this type, we could now write a function that takes in an arbitrary group, and it would know exactly what to do with it, because the “tag” (the first element of this 3-tuple) would tell the function what the underlying type is and what to expect with the binary operation and the neutral element.

Similarly, in set theory, an ordered set is technically a tuple of a set and an order relation. In type theory, we can analogously have a dependent pair (aka a sum type) of a type and an order relation on that type: $(T, >)$ where “ $>$ ” is a relation on $T$ . However, we haven’t discussed how to define relations yet.

This concludes the first part of this series. In the next post, we’ll learn how to define functions that return truth values, which includes predicate functions and relations.

Thanks to Simon Fischer for comments on a draft of this post.

^{^}
For example, the following function returns another function that always returns a constant, which is set by the outer function:
$λ x . (λ y . x)$
So, for example, this returns a function that always returns 4:
$(λ x . (λ y . x)) (4) \equiv λ y .4$
It might be a little clearer if we write this with the other notation:
$(x \mapsto (y \mapsto x)) (4) \equiv (y \mapsto 4) .$
Now, the problem appears if we re-use one of the inner argument variables outside:
$(λ x . (λ y . x)) (y) \equiv ?$
If we were just blindly following the substitution rule, we would get this:
$(λ x . (λ y . x)) (y) \equiv λ y . y$
which is of course not correct, because the result of a function application shouldn’t change just because we renamed some variables. To get the correct result, we need to avoid the naming clash by renaming the inner argument variable:
$(λ x . (λ z . x)) (y) \equiv λ z . y .$

[-]Algon3y32

This introduction is fantastic. I don't think I've seen something as comprehensive, clear and short as this before.

Edit:
That said, I would have loved it if you introduced a few exercises throughout, or perhaps spoilered the types of some objects so people who are new to the area could guess things in advance. I remember a lot of things clicked for me when I set about trying to "discover" the type theory formalism myself.

[-]Viliam3y21

I am also very happy about this; I learned something new, and the explanations felt perfectly clear.

Especially I liked the explanations "from outside perspective", like lambda notation vs arrow notation, or why the product symbol is used. These things are typically not mentioned in textbooks, because they teach the official notation, but this is exactly what you need as a beginner, to connect what you already know with the new things.

[-]β-redex3y20

Cool! (Nitpick: You should probably mention that you are deviating from the naming in the HoTT book. AFAIK usually and $Σ$ types are called Pi and Sigma types respectively, while the words "product" and "sum" (or "coproduct" in the HoTT book) are reserved for $A \times B$ and $A + B$ .)

I am especially looking forward to discussion on how MLTT relates to alignment research and how it can be used for informal reasoning as Alignment Research Field Guide mentions.

I always get confused when the term "type signature" is used in text unrelated to type theory. Like what do people mean when they say things like "What’s the type signature of an agent?" or "the type of agency is $(A \to B) \to A$ "?

[-]ruv3mo10

a logic formula like this:

At the first glance, it seems like the law of excluded middle is implied here (i.e., for any x, it either does not belong to A or belongs to A), and that's a typo. Then it should probably read:

∀𝑥(𝑥∉𝐀 ∨ 𝑥∈𝐀)

(i.e., "𝐀" instead of "𝐁")

[-]zanzibar73y10

Thanks for these posts. I've just started working through them, but great so far.

I became curious recently if some concept of homogenous sets should be serving foundation for applied mathematics, and that's lead me to a deeper curiousity about type theory. Types are already been a core concept in programming and databases for decades. Working on sympy, it was distressing how much work had to go on under the hood to make sure objects were really of right type to do a calculation, and how dis-organized, bug-prone, and inefficient this became. How many engineering mistakes could be avoided if physics and chemistry variables used in compuations had types accounted for their units automatically, or mismatched units could raise a warning? Or the conceptual difference between a date and a duration? It seems like something that should be a core part of university math sometimes.

But type theory has always felt like a rabbit hole when it comes to doing stuff. Your specification above reminded me of this.

>There are three basic building blocks in type theory: values, functions, and types. (And then you can additionally construct ordered pairs out of any combination of these.) Values could also be seen as constant functions that don’t accept any arguments, so, in a sense there are only functions and types, but we will talk about “values” as separate things nevertheless.

If we define everything in terms of functions, than we don't need values. For the same reason, its just useful to think of many undergraduate calculations as taking place over ℂ rather than ℚ or ℝ. But that means you need to understand ℂ before you should really be doing anything. But then we've got vectors, and matrices, and tensors ... With python's numpy, it's extra cognitive load to have to worry about whether a returned value's type is a float or a 1x1 array, or maybe even some other equivalent type.

And that's where I start to get turned off from type theory. Instead of giving me the foundation to express the ideas I'm really interested it, it is getting in the way.

[-]philip_b3y10

I have recently read The Little Typer by Friedman and Christiansen. I suspect that this book can serve as an introduction similarly to this (planned, so far) sequence of posts. However, the book is not concise at all.

[-]DPiepgrass3y0-5

Please don't defend the "∏" notation.

It's nonsensical. It implies that

λ (x : N) . x

has type " ∞! × 0"!

[-]Dacyn3y10

How so? It looks like you have confused with $\prod_{x : X} x$ . In this example we have $X = Y = N$ , so the type of $λ (x : N) . x$ is $\prod_{x : N} N$ .

[-]DPiepgrass3y10

Pardon me. I guess its type is .

LESSWRONG
is fundraising!
LW