Mental Context for Model Theory

12tgb

10Nisan

34So8res

7Richard_Kennaway

1tgb

8tgb

7Sunny from QAD

7DavidS

0So8res

0DavidS

0So8res

0shminux

1DavidS

6lukeprog

0shminux

3shminux

5So8res

1shminux

2So8res

1shminux

2So8res

1shminux

2So8res

2shminux

0ESRogs

2So8res

0ESRogs

3Douglas_Knight

4So8res

1Vaniver

4So8res

2ESRogs

0tristanhaze

0ESRogs

-1Anatoly_Vorobey

7Anders_H

11Richard_Kennaway

7Anders_H

1Jiro

0Epictetus

4Richard_Kennaway

1Kindly

2Epictetus

0Kindly

-9Kindly

2dxu

0Kindly

New Comment

47 comments, sorted by Click to highlight new comments since: Today at 11:57 AM

As someone who is going to be TA-ing a math course next year, I appreciate hearing your perspective in "The Right to use a name" section. That's not something I can recall stumbling on, but I can easily see it affect people and will try to keep it in mind in the future both for myself and for anyone I'm teaching. Certainly the approach of "let f(x) = blah" shortly followed by "and now we will show that f is well-defined" is of a similar vein and is rather oblique when first encountered - how could it not be well defined, you just defined it! And I could see now that there would be a good argument pedagogically to switch the order of exposition that mathematicians probably avoid because of either the extra writing involved or just following long-standing conventions.

I'd recommend something like

We want something that corresponds to the intuitive idea of order. Let's unpack this intuition. Now,

giventhat some relation R has those properties we are then justified in using the symbol ≤.

In other words, you don't need to hide your destination -- you just need to make it clear that intuitive labels are a privilege entitled to objects that have demonstrated good behavior.

Certainly the approach of "let f(x) = blah" shortly followed by "and now we will show that f is well-defined" is of a similar vein and is rather oblique when first encountered - how could it not be well defined, you just defined it!

It's an important lesson, going beyond mathematics, that defining a concept does not guarantee that there is anything that satisfies it. The concept may turn out to be empty, confused, or contradictory, however clear an idea it seemed that you had at the time.

I thought about this some more and want to elaborate on what we're talking about for those who haven't encountered the question of being "well-defined" in math and might not know what exactly it is we mean.

**Example**: A definition that implicitly assumes existence of something.

If we have a collection of (real) numbers X, we might want to know what the largest number in that collection is. So let's define max(X) to mean the largest number in X. Is this well-defined? Sure, I just defined it! But then what is max(X) when X is, say, all positive integers? No positive integer is larger than all the others, so there isn't a largest number in X as every number in X is smaller than some other one.

**Example**: A definition that involves a implicit choice.

If we have a (real) number n and write the set of integers as Z, then we might write n+Z to mean all the numbers that may be written as a sum n+k for some integer k. We call n+Z a *coset* of Z. Note that we are definitely allowing n to be a non-integer value, such as n=1/2. Nothing is wrong with this definition.

But we can add integers, so *can we add cosets*? Well, let's try defining what it would mean to add two cosets n+Z and m+Z together. Define n+Z + m+Z = (n+m)+Z, which seems to be the most natural thing to try. But are we done - is this actually well-defined?

Not really, since although we wrote n+Z down that way, we can see that there are other ways to write it, too. We get exactly the same set of numbers if we had instead used (n+1)+Z, so n+Z = (n+1)+Z. So our definition of how to add implicitly assumes that we have a 'chosen' way to write down the coset. But luckily, the way we defined it doesn't actually depend on how we wrote down the coset! For example, (n+1)+Z + m+Z = (n+m+1)+Z = (n+m)+Z. In essence, even though there are multiple ways to write down the cosets n+Z and m+Z, there are also multiple ways to write down (n+m)+Z and the different ways for the first two just give different ways to write the second one. So this can be shown to be well-defined even though it involved an implicit choice.

**Question**: Does anyone have a good example of this off-hand of these kinds of things in real life? The only non-contrived example I have is an ontological 'proof' of God.

"A model is an interpretation of the sentences generated by a language. A model is a structure which assigns a truth value to each sentence generated by some language under some logic."

I think this phrasing will be very misleading to anyone who tries to learn model theory from these posts. This is one thing a model DOES, but it isn't what a model IS. As far as I can tell, you nowhere say what a model is, even approximately. Writing out precisely what a model is takes a lot of space (like in the book you're reading!) so let me give an example.

Our alphabet will be the symbols of first order logic, plus as many variable names as we need, and the symbols +, =, 0.

Our axioms are

∀ x : x+0=0+x=x

∀ x,y: x+y=y+x

∀ x,y,z: (x+y)+z=x+(y+z)

∀ x ∃ y : x+y=y+x=0

Our THEORY is the set of all logical consequences of these statements, where "logical consequence" means "obtainable by the formal rules of first order logic . A MODEL of our theory is a specific set G, a specific element of G called 0 and a specific operation + taking two elements of G and returning a third element of G, such that all of these statements are true about G. In other words, a model of this theory is an abelian group.

One thing an abelian group can do is give us a way to assign a true or false value to any statement in our language. For example, consider the statement ∀ x ∃ y : y+y+y=x. This statement is true in the group of rational numbers, but false in the group of integers. If we choose a particular abelian group, that will force a specific choice as to whether this statement is true or false.

However, you shouldn't identify an abelian group with a way of assigning truth values to statements about abelian groups. For example, the rational numbers and the real numbers are both abelian groups and, as it turns out, there is no statement using only +, 0, = and logical connectives whose truth value is different in these two groups. Nonetheless, they are different models.

Thanks! Good point, that distinction is useful. I've updated the post to make this more clear (under the "models" header).

Personally, I tend to view things the other way around. As far as I'm concerned, a model of abelian group theory is *anything* that interprets sentences appropriately (while obeying the rules of the logic), for some value of "interprets".

It so happens that any model of group theory is isomorphic to some pointed set with an associative operator for which the point is an identity, but the model doesn't have to *be* a pointed set with an associative operator for which point is an identity. It could also be operations on a rubix cube. From my point of view, you've got 'IS' and 'DOES' backwards :-)

It's just perspective, I suppose. I don't particularly view set theory as foundational; I view it as one formalization that happens to have high enough fidelity to represent the behavior of any given model.

Still, your view is definitely the more standard one.

As far as I can tell, you nowhere say what a model is, even approximately

Well, this post is "Context for Model Theory": I didn't intend to introduce models themselves here. Though your concerns probably apply to the follow-up post as well.

The operations on a Rubix cube aren't abelian. Is that just a typo on your part, or am I missing some subtle point you are making?

I'm not sure what you are getting at when you say you don't want to found math on sets. I definitely intended to use the word "set" in a naive sense, so that it is perfectly fine for sets to contain numbers, or rotations of a Rubix cube, or for that matter rocks and flowers. I wasn't trying to imply that the elements of a model had to be recursively constructed from the nullset by the axioms of ZFC. If you prefer "collection of things", I'd be glad to go with that. But I (and more to the point, model theorists) do want to think of a model as a bunch of objects with functions that take as inputs these objects and make other objects, and relations which do and do not hold between various pairs of the objects.

I'm retracting a bunch of the other things I wrote after this because, on reflection, the later material was replying to a misreading of what you wrote in your following post. I still think your de-emphasis on the fact that the model is the universe is very confusing, especially when you then talk about the cardinality of models. (What is the cardinality of a rule for assigning truth values?) But on careful reading, you weren't actually saying something wrong.

Oops, typo. (The typo was that I said "commutative" when dereferencing "group"; notice that I said "any model of group theory" and not "any model of abelian group theory".) Thanks for the tip.

I'm not sure what you are getting at when you say you don't want to found math on sets ... I wasn't trying to imply that the elements of a model had to be recursively constructed from the nullset by the axioms of ZFC.

Ok, cool. I guess my point is that set theory is a formal *representation* of real things, but it is not the things themselves. The "model" is the real thing, which happens to be *representable* as a set. I tried to make this wording clear (especially in the next post), but I don't think I succeeded.

But I (and more to the point, model theorists) do want to think of a model as a bunch of objects with functions that take as inputs these objects and make other objects, and relations which do and do not hold between various pairs of the objects.

Me too! But mostly because my "implicit" formal system is set theory. If we were working with different foundations (let's say type theory, because that's the only other potentially-foundational system I know) then I would want to think of a model as elements of a type, and function symbols would need to be typed, and so on.

This is why I defined the model as an in interpretation which follows certain rules, rather than as a set+function specifically: In my head, the concept of a model is separate from the system I use to represent them.

At this point, it's a matter of perspective, and I acknowledge that my viewpoint is non-standard. You're definitely correct that I should have used more concrete examples ("these axioms are group theory; actual groups are models" etc.) from the get-go.

I still think your de-emphasis on the fact that the model is the universe is very confusing, especially when you then talk about the cardinality of models.

Thanks, I've edited the post to make this a bit more clear.

But on careful reading, you weren't actually saying something wrong.

I very much appreciate the critiques. I admit that the next post is pretty sloppy; it was somewhat rushed and I couldn't go into the depth I wanted. I far underestimated how much must be taught before you can express even the easy parts of model theory. I skimped on formally defining quite a few things, power-of-a-model among them.

However, you shouldn't identify an abelian group with a way of assigning truth values to statements about abelian groups. For example, the rational numbers and the real numbers are both abelian groups and, as it turns out, there is no statement using only +, 0, = and logical connectives whose truth value is different in these two groups. Nonetheless, they are different models.

Hmm, but the axiom sets are different for rationals and reals, since the latter require Dedekind-completeness, which selects a different theory from the language+logic (in So8res's terms). Why would one try to compare/distinguish models in different theories based on a subset of the logic and a subset of axioms?

The reals can be studied as models of many theories. They (with the operation +, relation = and element 0) are a model of the axioms of an abelian group. They are also a model of the axioms of a group. The reals with (+, *, 0, 1, =) are a model of the axioms of a field. The reals with (+, *, 0, 1, =, <) are a model of the axioms of an ordered field. Etcetera...

Models are things. Theories are collections of statements about things. A model can satisfy many theories; a theory can have many models. I agree completely with So8res statement that it is important to keep the two straight.

In addition, your example of Dedekind completeness is an awkward one because the Dedekind completeness axiom is a good example of the kind of thing you can't say in first order logic. (There are partial ways around this, but I'm trying to keep my replies on the introductory level of this post.) But I can just imagine that you had distinguished the reals and the rationals by saying that, in R, ∃ x : x^2=1+1 is true and in Q it is false, so I don't need to focus on that.

Not quite. It's a map of the *sentences generated by a language* to Z_2. When we talk about "a model of a theory", we're discussing those maps for which all sentences in the theory under consideration are mapped to 1.

To put it another way, a model cannot map just a theory on to Z_2; every model must map every sentence to Z_2 one way or the other. Some theories are such that their models must make arbitrary choices for sentences "outside the theory", this is precisely "incompleteness".

But yeah, there's use in mapping sentences to something larger than Z_2. Down this road lies multi-valued logics. And if you want to map sentences onto the interval [0, 1] you're getting pretty close to probabilistic logic.

Note that in the referenced paper, the probability function is constructed from a measure of how many available models map a sentence to 1 rather than 0, which is an interesting way to map sentences onto a continuous interval.

Not quite. It's a map of the sentences generated by a language to Z_2. When we talk about "a model of a theory", we're discussing those maps for which all sentences in the theory under consideration are mapped to 1.

So, any sentence that does not map to 1 is not in the model? I thought a model includes both true and false sentences. I guess I should instead think of a model as a preimage of a single element (conventionally called "truth") in some surjective theory->image map. Or is it language->image map? Is model a subset of some theory or just of a language?

To put it another way, a model cannot map just a theory on to Z_2; every model must map every sentence to Z_2 one way or the other.

Every sentence of the language? And the theory is the preimage of 1? I'm confused. This can't be right, since there can be multiple models for the same theory. Or do all models for a given theory have the same preimage of 1 but differ in other ways?

Some theories are such that their models must make arbitrary choices for sentences "outside the theory", this is precisely "incompleteness".

Does this mean "in an undecidable (unsatisfiable? incomplete?) language there is no map language->Z_2"?

Hmm, I think I've misunderstood you somewhere. By Z_2 you mean "integers modulo 2" (the set {0, 1}), correct? If you want to think of a model as a function from sentences onto {0, 1}, then sentences which map to "0" in some model are indeed "not in the model". (This phrasing is uncommon; sentences are not usually referred to as "in" a model. Rather, the model either *holds / models / models as true* or *rejects / does not model / models as false* each sentence.)

I thought a model includes both true and false sentences

A sentence is neither true nor false except under consideration of a model. (Sentences true in every model are called "valid", sentences false in every model are called "refutable", but even then "truth" is a property assigned to sentences by models.) The sentences that a model models are true *according to that model*, so it's tautologically true that a model does not hold any sentences that are "false in" (rejected by) that model.

If, by "true" and "false" you meant "valid" and "refutable", then note that there are no models which model refutable sentences.

A sentence is neither true nor false except under consideration of a model.

OK, so, if I understand you correctly, the following sequence is sensible, though not standard and probably backward:

We construct a surjective map from a given language+logic to some set.

We designate one element of the image as "1" or "true".

We construct the preimage of 1, called "provably true sentences".

We select a subset of the language+logic called a "theory", based on some external semantic considerations.

The intersection of the theory and the set of provably true sentences is called the model of the theory in the language+logic with the map as given in step 1. ("The" model because to get a different model we have to change the map in step 1.)

This way if a theory is not a subset of the preimage of "1" (i.e. a set of provably true sentences), it is called incomplete, even though the model singled out in the theory by the map in step 1 is complete.

Am I off-base completely?

Still a little off base.

In this analogy, the map is playing the role of "model" (as it assigns sentences to the analog of truth values).

Note that in order for the map to be a model, the map must have certain behavior (whenever both φ and ψ map to 1, φ∧ψ also maps to 1, etc.). In model theory we restrict consideration to maps obeying these laws; if your map strays outside these boundaries then model theory has nothing to say about it.

In model theory, we say that a model is "of" a theory when the model holds true *all* sentences in the given theory. Thus, in your construction, the preimage of 1 would be the theory of the map (the largest theory that the map is a model of).

If you construct a different theory T that contains sentences not in the preimage of 1, then we would say the map is *not* a model of T (because there are sentences of T which are not "true" under the map).

The object you're discussing in pt.5 (the intersection between the theory of a model and some other theory) does not have an obvious analog in model theory.

Still a little off base.

Thank you for your patience!

In this analogy, the map is playing the role of "model" (as it assigns sentences to the analog of truth values).

Hmm, I thought only the preimage of 1 is the model.

in your construction, the preimage of 1 would be the theory of the map (the largest theory that the map is a model of).

But... a theory can include sentences not in the preimage of 1 (undecidable?)... I am confused.

I would instead say that the preimage of 1 is the largest model of the map, and is the model of any "large enough" theory.

Note that in order for the map to be a model, the map must have certain behavior (whenever both φ and ψ map to 1, φ∧ψ also maps to 1, etc.). In model theory we restrict consideration to maps obeying these laws; if your map strays outside these boundaries then model theory has nothing to say about it.

Where are these laws defined? In the logic? In the logic+language? Then the only "valid" maps are those which are homomorphisms (in some sense) from logic(?) to the Z_2 subset (true/false) of the codomain.

Thanks again!

I've missed something in my explanation of models. Allow me to define them more precisely.

Intuitively, we want an "interpretation" of the sentences generated by a logic+language which assigns each to a truth value. Model theory formalizes this as an object+relation, but we can also look at it as a map from sentences onto Z_2.

Any such means of assigning a sentence to {true, false} (or equivalent) is an "interpretation" of sorts, but *not necessarily* a model. We reserve the term "model" specifically for not-stupid interpretations (ones where the interpretation maps "x" and "not x" to different values, etc.)

In your construction, when you consider a surjection from sentences to some set and pick one element of the range to be "truth", you've essentially defined an interpretation in a roundabout way. (Every sentence mapped to 1 is true in that interpretation, every sentence mapped elsewhere is false in that interpretation.)

If your interpretation obeys the rules of logic ("x*y" maps to 1 whenever "x" maps to 1 and "y" maps to 1) then it's a model. Otherwise, model theory doesn't have much to say about it.

I'm not sure I understand the construct you're describing: does the above help at all? I'm not sure if I'm answering the right questions.

Don't worry about decidability in this context, I think it might be confusing things somewhat. The point I was making earlier about completeness is this:

If you consider the set of all sentences as the domain of your map, then a "theory" T is just a subset of the domain of your map. If there are multiple models (functions which obey the rules of the logic) from the set of all sentences onto Z_2 which map the subset of all sentences (theory) T to 1, then T is "incomplete".

Thanks for this clarification. When I first read the sentence shminux quoted, I imagined that any assignment from sentences to truth values constituted a model, but then realized I was confused when I got to the sentence in the next paragraph saying that, "because there is no model of the theory of order which is also a model of [(∃x)¬(x≤x)]."

I thought, "What about the 'model' where you directly assign (∃x)¬(x≤x) to true as well as the sentences in the theory?" But if I understand correctly now, what I just described does not constitute a model. All models are maps from all sentences in a language to a truth value, but not all such maps are models.

Is there a straightforward way to describe which maps are allowed as models (of a given theory)? Is it something like -- a model is any truth assignment that does not mark as false any sentences in the theory or any sentences that can be derived from the theory by constructing new sentences according to the rules of the logic?

Yes! I cover this in the first part of Very Basic Model Theory, the next post. "Model" is the name for the subset of mappings from sentences to truth values that interpret the logical symbols in a very specific way. See the linked post for details.

Mostly just that he avoided using it as long as possible. At least one other person feels that could be indicative of reluctance to include the axiom. I agree that this is a bit weak, so beyond that I'll hide behind artistic license.

Really, it's more honest to say "We have a binary relation R, satisfying ..., which justifies our use of the ≤ symbol for R."

I'm not sure how I feel about this. In the abstract, I agree, but I've found this mildly frustrating when I've actually seen it. The most recent example: in a group theory class, the professor introduced "special groups" (with the explicit caveat that the term was temporary) which satisfied some property A, and eventually worked up to the proof that groups that satisfy property A also satisfy property B, and groups which satisfy property B are defined to be "normal," and so by "special groups" we mean "normal groups." As I had read ahead, I found myself wondering "*are* these normal groups? I'm pretty sure they are, but then why not call them normal groups?"

Thanks for the input. I agree that your teacher introduced that concept poorly, from the description you've given. My advice applies only when the name being introduced is strongly tied to specific behavior.

It sounds like your trouble was that the name "normal" did not come attached to specific behavior, so the appeal to intuition at the end failed. I imagine that if the teacher introduced "cyclic" groups using this pattern, that may have been less opaque.

Even then, there's a difference between "thingies of this type are the same as thingies of that type, so we can call them by the same name" and "once we've shown this property we're *allowed* to use this name, which pumps your intuition".

In fact, this suggests a fun way to introduce cyclic groups -- mention that you want to define a "cyclic" group, talk about the properties you're going to need to justify the loaded name, guide discussion towards generator objects, formalize the idea, and *then* show how generator objects work with infinite groups. Might be a little less unnerving than learning that Z is cyclic right off the bat.

Even with these caveats, I readily acknowledge that this method of teaching won't work for everyone.

'At its core, model theory is the study of what you *said*, as opposed to what you *meant*.'

One way to improve the clarity of this gloss, and make it more ecumenical (to be frank, I imagine as it stands, many philosophers would balk and sort of go 'WTF?' and treat this as a weird, confused thing to say), might be as follows: distinguishing the meaning of an expression in some language from the speaker's intended meaning in producing that expression. These can of course diverge, but both are semantic notions. (Your use of the two different terms above may obscure this, by suggesting that you can't use 'mean' and its cognates for speaker-meaning.)

Kripke has a paper called 'Speaker's Reference and Semantic Reference' which may be helpful. (There's an online copy here at present, but in any case it's not hard to find.) What it seems you want to do, insofar as you think there's more to meaning than reference, is something like: generalize Kripke's basic idea here to meaning in general (and so factoring in internal components of meaning such as 'role in the system', as well as just reference) and then use that distinction to say that model theory is about linguistic as opposed to speaker-meaning.

But I'm still not sure why you'd want to say (or emphasize) that. My reaction is: yeah, applications of model theory are often geared that way, but why couldn't you also give model-theoretic accounts of speaker-meaning? But perhaps I've misunderstood.

distinguishing the meaning of an expression in some language from the speaker's intended meaning in producing that expression

I'm not sure that's actually what he meant to say (oh the irony!). The "what you *said*" in the original I don't think corresponds to "the meaning of an expression in some language", but rather to something more like, "the expression itself" or "the set of all possible meanings of an expression."

The fifth axiom is the only one which requires some effort to understand. Intuitively, it states that parallel lines do not intersect.

No. This is bad and you should feel bad. Parallel lines do not intersect, and the fifth postulate has nothing to do with it. What do you imagine the definition of "parallel lines" is?

Parallel lines do not intersect by definition, in any geometry, Euclidean or non-Euclidean. The parallels postulate talks about something completely different.

I'm not sure why this comment is being downvoted - perhaps because of the tone - but the content in it is true

The definition of parallel lines is essentially "lines that don't intersect". We therefore do not need an axiom to show that if two lines are parallel, the do not intersect - this just follows from the definition.

The fifth postulate says that for every line L and point P outside of L, the parallel line L' through P is unique (Existence of L' follows from axioms 1 through 4)

I agree that this was not written in the respectful tone that I would like to see at Less Wrong. I wish Anatoly had phrased that differently.

I am however concerned that when true statements are downvoted, there is a real risk that readers misunderstand why the comment was downvoted and assume that the contents are untrue. For the benefit of those readers, I simply wanted to state for the record that the contents of his message were indeed true.

This is not just a matter of misstating the axiom. The original post reads:

One of these things is not like the other. The fifth axiom is the only one which requires some effort to understand. Intuitively, it states that parallel lines do not intersect. This statement irked Euclid for reasons apart from the ugliness of the axiom.

The fact that parallel lines do not intersect seems like it should follow from the definition of lines and angles. It doesn't seem like something we should have to specify in addition. That we must assume parallel lines do not intersect (rather than proving it) was long seen as a wart on geometry.

In reality, the fact that parallel lines do not intersect *does* follow from the definition of the word "parallel". Therefore, the error results in several of the paragraphs in the original post being meaningless or untrue.

I am however concerned that when true statements are downvoted, there is a real risk that readers misunderstand why the comment was downvoted and assume that the contents are untrue.

A "true statement" wasn't downvoted. A comment containing one true statement and one attack that is not a true statement (made as a separate statement, conveniently) was downvoted.

In reality, the fact that parallel lines do not intersect does follow from the definition of the word "parallel". Therefore, the error results in several of the paragraphs in the original post being meaningless or untrue.

The trouble is that in 2-D Euclidean space, there are many equivalent definitions of "parallel". It just so happens that straight lines that don't intersect also have the same slope,will intersect any transverse line at congruent angles, and are always the same distance apart (and vice versa). However, these properties need not be equivalent in non-Euclidean geometry.

The OP's issue seems to be that defining parallel lines as those which do not intersect is *artificial*. It's a workaround Euclid developed to smooth over his presentation. He could not use local properties of lines and angles to prove parallel lines didn't intersect. So, he defined them as lines that don't intersect, introduced the parallel postulate, and then used those to prove the other properties of parallel lines. Later mathematicians found this to be rather inelegant and tried to prove parallel lines didn't intersect using only properties of lines and angles.

Sure, it's an error if you use Euclid's definition of parallel, but I wouldn't call the discussion meaningless. It touches on a very important issue of how to define things and what properties we want to retain when we generalize a notion.

The discussion would be helped if people consulted what Euclid wrote.

The trouble is that in 2-D Euclidean space, there are many equivalent definitions of "parallel"

It would be better to put that as there being many concepts which, in the presence of the 5 postulates, are all equivalent to the one that Euclid calls "parallel". When one is not considering the foundations of geometry, it does not matter which of these properties one calls "parallel", as one understands that when any of these properties is satisfied, all are. When one is considering the foundations, it does matter, and only confusion can result from using any but Euclid's.

But the issue of the 5th postulate is not about definitions. Euclid's 5th postulate does not mention parallelism at all. There are many other 5th postulates one can substitute for Euclid's and get the same geometry, but the problem (so I gather from a few minutes wiki-ing) was that all of them seemed rather more complicated than the other four, leading many mathematicians to search for a proof that would render them all unnecessary.

Bolyai and Lobachevsky (and Gauss before them, but unpublished) settled the matter by working out what looked like a consistent theory of hyperbolic geometry. I say "looked like", because mathematical logic was yet to be invented, and even Hilbert's axioms were still in the future. Models of hyperbolic geometry within Euclidean geometry were found, still in the 19th century, definitively settling the matter.

When it comes to neutral geometry, nobody's ever defined "parallel lines" in any way *other* than "lines that don't intersect". You can talk about slopes in the context of the Cartesian model, but the assumptions you're making to get there are far too strong.

As a consequence, no mathematicians ever tried to "prove that parallel lines don't intersect". Instead, mathematicians tried to prove the parallel postulate in one of its equivalent forms, of which some of the more compelling or simple are:

The sum of the angles in a triangle is 180 degrees. (Defined to equal two right angles.)

There exists a quadrilateral with four right angles.

If two lines are parallel to the same line, they are parallel to each other.

It's also somewhat misleading to say that mathematicians were mainly motivated by the *inelegance* of the parallel postulate. Though this was true for some mathematicians, it's hard to say that the third form of the parallel postulate which I gave is any less elegant, as an axiom, than "If two line segments are congruent to the same line segment, then they are congruent to each other". Some form of the latter was assumed both by Euclid (his first Common Notion) and by all of his successors.

A stronger motivation for avoiding the parallel postulate is that so much can be done without it that one begins to suspect it might be unnecessary.

When it comes to neutral geometry, nobody's ever defined "parallel lines" in any way other than "lines that don't intersect". You can talk about slopes in the context of the Cartesian model, but the assumptions you're making to get there are far too strong.

Well, Euclid was **the** standard textbook in geometry for a long time. There was a movement in the 1800s to replace the *Elements* with a more modern textbook and a number of authors used different definitions, which just ended up requiring them to introduce other axioms to get the result. Lewis Carroll ended up satirizing the affair.

It's also somewhat misleading to say that mathematicians were mainly motivated by the inelegance of the parallel postulate.

If it were elegant, mathematicians wouldn't have spent 2,000 years trying to prove it from the other four postulates. I very much doubt Euclid himself liked it. Intuition suggests that the result should follow from more elementary notions.

It was a workaround to let Euclid get on with his book and later mathematicians looked for a more elegant formulation.

Though this was true for some mathematicians, it's hard to say that the third form of the parallel postulate which I gave is any less elegant, as an axiom, than "If two line segments are congruent to the same line segment, then they are congruent to each other".

Is it obvious from the definition of parallel l lines that this ought to be true? That equality should be transitive seems like so obvious an idea that it's barely worth writing down.

EDIT: It's worth noting that classical mathematicians had very different ideas about what axioms should be. To them, axioms should be self-evident. Modern mathematics has no such requirements for its axioms. These are two very different attitudes about what axioms ought to be.

There was a movement in the 1800s to replace the Elements with a more modern textbook and a number of authors used different definitions.

What other definitions of "parallel line" do you have in mind?

Is it obvious from the definition of parallel l lines that this ought to be true? That equality should be transitive seems like so obvious an idea that it's barely worth writing down.

Congruence and equality are not the same thing. One of these axioms says that being parallel is transitive; the other says that being congruent is transitive. I agree that both notions become much less useful if transitivity does not hold, but a non-transitive congruence relation is not nonsensical.

[This comment is no longer endorsed by its author]

I'm reviewing the books on the MIRI course list. After my first four book reviews I took a week off, followed up on some dangling questions, and upkept other side projects. Then I dove into

Model Theory, by Chang and Keisler.It has been three weeks. I have gained a decent foundation in model theory (by my own assessment), but I have not come close to completing the textbook. There are a number of other topics I want to touch upon before December, so I'm putting

Model Theoryaside for now. I'll be revisiting it in either January or March to finish the job.In the meantime, I do not have a complete book review for you. Instead, this is the first of three posts on my experience with model theory thus far.

This post will give you some framing and context for model theory. I had to hop a number of conceptual hurdles before model theory started making sense — this post will contain some pointers that I wish I'd had three weeks ago. These tips and realizations are somewhat general to learning any logic or math; hopefully some of you will find them useful.

Shortly, I'll post a summary of what I've learned so far. For the casual reader, this may help demystify some heavily advanced parts of the Heavily Advanced Epistemology sequence (if you find it mysterious), and it may shed some light on some of the recent MIRI papers. On a personal note, there's a lot I want to write down & solidify before moving on.

In follow-up post, I'll discuss my experience struggling to learn something difficult on my own — model theory has required significantly more cognitive effort than did the previous textbooks.

## Between what was meant and what was said

Model theory is an abstract branch of mathematical logic, which itself is already too abstract for most. So allow me to motivate model theory a bit.

At its core, model theory is the study of what you

said, as opposed to what youmeant. To give some intuition for this, I'll re-tell an overtold story about an ancient branch of math.In olden times, Euclid built Geometry upon five axioms:

One of these things is not like the other. The fifth axiom is the only one which requires some effort to understand. Intuitively, it states that parallel lines do not intersect. This statement irked Euclid for reasons apart from the ugliness of the axiom.

The fact that parallel lines do not intersect seems like it should follow from the definition of lines and angles. It doesn't seem like something we should have to specify

in addition. That we mustassumeparallel lines do not intersect (rather thanprovingit) was long seen as a wart on geometry.This wart irked mathematicians for millennia, until finally it was discovered that the fifth axiom is independent of the other four. You can build consistent systems where parallel lines intersect. You can build consistent systems where they diverge.

This seemed crazy, at the time: parallel straight lines cannot diverge! Surely, a geometry in which they do is absurd!

The problem is that mathematicians were imagining "straight lines" in their head that did not match the mathematical objects specified by the first four axioms of Euclid.

This mistake was invited by names which Euclid chose. "Straight lines" invoke a mental image that is more specific than that which the axioms describe. If you detach the provocative words from the axioms

`LUME`

between any two`PTARS`

`LUME`

into a`SLUME`

and so on, then it's much easier to understand that the

`LUME`

s which Euclid's axioms describe may not match up with the image of a "straight line" in your head. It is much easier to understand that there may be interpretations of`LUME`

which do not obey the fifth postulate.In fact, if you take Euclid's first four postulates, there are many possible interpretations in which "straight line" takes on a multitude of meanings. This ability to disconnect the

intendedinterpretation from theavailableinterpretations is the bedrock of model theory. Model theory is the study ofallinterpretations of a theory, not just the ones that the original author intended.Of course, model theory isn't really about finding surprising new interpretations — it's much more general than that. It's about exploring the breadth of interpretations that a given theory makes available. It's about discerning properties that hold in all possible interpretations of a theory. It's about discovering how well (or poorly) a given theory constrains its interpretations. It's a toolset used to discuss interpretations in general.

At its core, model theory is the study of what a mathematical theory actually says, when you strip the intent from the symbols.

## Iron walls

Before you can do model theory, you have to erect iron walls between four different concepts.

## Logics

A logic is a formal system for building and manipulating sentences. Traditionally, this logic defines a number of symbols (

`( ) ∧ ¬ ∀ ∃ ≡ ν '`

, for example) and rules for building sentences from those symbols.Note that

you cannot generate sentences from a logic alone. Rather, youusea logic to generate sentencesfroma language.Also, remember that the rules of a logic are

syntactic, such as "if`φ`

is a sentence then`(¬φ)`

is a sentence".Finally, remember that logics are just rules for generating sentences. A logic is perfectly happy to generate sentences shaped like

`x∧(¬x)`

, in spite of all your protests about contradictions.## Languages

A language is a collection of symbols.

Fromthose symbols,usinga logic, you can start generating sentences.For example, in the propositional logic, using the language

`{x, y}`

, the string`hello`

is surely not a sentence (for it fails to use the appropriate symbols). Nor is the string`¬xy`

a sentence: it fails to follow the rules of the logic.`((¬x)∧y)`

is a sentence, for it uses the appropriate symbols and follows the given rules.Many results in model theory are achieved by holding the logic fixed and varying the language, so it's essential that these concepts be distinct in your mind.

## Theories

A theory is a collection of sentences written in one language. For example, in the language

`{≤}`

under first-order logic, we can discuss the theory`(∀x)(x≤x)`

`(∀xy)(x≤y)∧(y≤x)→(y≡x)`

`(∀xyz)(x≤y)∧(y≤z)→(x≤z)`

which is the theory of

order. (The axioms above are reflexivity, antisymmetry, and transitivity).Remember that a theory is just a set of sentences drawn from all available sentences. These sentences aren't particularly special unless you make them special. Sentences like

`(∃x)¬(x≤x)`

are fine sentences built from the language`{≤}`

, even though they directly contradict the theory. Theories don't affect the sentences of a language — they're just a grab-bag of some sentences that seemed interesting to someone.## Models

A model is an

interpretationof the sentences generated by a language. A model is a structure which assigns a truth value to each sentence generated by some language under some logic.(More specifically, it's a structure that assigns binary values to sentences in such a way that we're justified in the name "truth value": for example, we require that a model says φ is true if and only if it says that ¬φ is false, and so on.)

Only once we start interpreting sentences is it meaningful to talk about valid or refutable sentences. Once you have a model of

`{≤}`

that happens to say that the axioms 1, 2, and 3 above are true,thenyou can start talking about how the theory of order rules out the sentence`(∃x)¬(x≤x)`

— because there is no model of the theory of order which is also a model of this sentence.(You can actually talk about how

`(∃x)¬(x≤x)`

is inconsistent with the theory of order without appealing to model theory, but I find it helpful to treat everything as raw symbols until interpreted by a model.)To give a concrete example, in

first order logic,using thelanguage{S, +, *, 0}, thetheoryof arithmetic is the theory laid out by the [Peano axioms](http://en.wikipedia.org/wiki/Peano_axioms#First-order_theory_of_arithmetic). The actual natural numbers zero, one, two, ... are a model of this theory (where zero is the interpretation of 0, one is the interpretation of S0, etc.).Also, it's worth noting that

anyobject that interprets sentences and follows the rules of the logic qualifies as a model. There are often many non-isomorphic objects that interpret the same sentences in the same way. For example, rational numbers and real numbers are models of group theory that agree on every sentence in the language of groups, despite being different models.Distinctions between these four points is something that seems obvious to me in hindsight, but I explicitly remember expending cognitive effort to separate these concepts mentally, so there you go. Make sure these distinctions are wrought in iron before attempting model theory.

## The Right to use a name

There's something about math education in general that has troubled me for quite some time, and which I'm finally able to articulate. It's quite possible that this is a personal nit, since nobody else seems to care — but I'll share it anyway.

Many math textbooks treat properties that

justify a nameof a thing as statements about the thingafternaming it.This is a little abstract, so I'll make a silly example. Imagine someone is trying to show that, in category theory, composition of arrows is associative. They shouldn't appeal to visual intuition or any diagrams of arrows.

The concept that following arrows is an associative operation is so ingrained in the concept of "arrow" that it's difficult to describe the property in English without sounding dumb.

This property of arrows is so stupidly obvious that the statement is frustrating. Further, it hides the following fact:

Associative composition between thingies issomething we must have before we're justified in calling the thingies "Arrows".Associative composition is what

allowsyou to use the name "arrow" and draw visual diagrams. You can't appeal to my intuition about "arrows" to show that composition is associative. It's the other way around! Onlyafteryou show that your thingies have associative composition are you allowed to label them as "arrows".As another example, the axioms of order (above) are what

allowus to use the`≤`

symbol, which appeals to our intuitive idea of order. Really, it's more honest to say "We have a binary relation`R`

, satisfying`(∀x)R(xx)`

`(∀xy)R(xy)∧R(yx)→(y≡x)`

`(∀xyz)R(xy)∧R(yz)→R(xz)`

which

justifiesour use of the`≤`

symbol for`R`

."I imagine this is not a problem for experienced mathematicians, for whom it goes without saying that you must formally specify (or disregard) all intuitive baggage that comes attached to the names. However, I remember distinctly a number of times when I gnashed my teeth with boredom as teachers made obvious statements (

of course`≤`

is reflexive, why do we even need to say this?), simply because I didn't understand this idea.I mention this because the first few sections of the

Model Theorytextbook make statements that seem quite obvious. It's easy to grind your teeth and say "duh, hurry up". It's a little harder to understand exactly why such things must be said. In that light, I think this is a good piece of advice for learning mathematics in general:If you find yourself wondering why a statement must be said, check whether the statement is justifying any names.

Binding meaning

The early parts of

Model Theorywill go down much easier if you realize that they're binding logical symbols to the appropriate meaning (and thus justifying the name "model").For example, when we state "M models

`φ∧ψ`

if and only if it models`φ`

and it models`ψ`

", it's easy to say "well duh". It's a little harder to understand thatthis is the mechanism by which the symbol`∧`

is bound to the interpretation "and".Also, note that the ability to distinguish between "the symbol

`+`

in the language L" from "the addition function as interpreted by the model M" is absolutely crucial.## Totality

Something that kept on biting me was this:

Models of first-order logic are "total". They have something to say abouteverysentence in a language. Even where atheoryis incomplete, any individualmodelis "complete". A model of first-order logic interprets function symbols by total functions and relations by set-theoretic relations. The relationship`⊧`

is total: for every sentence, either`M⊧φ`

or`M⊧¬φ`

.This is a point where my intuitive notion of "models as interpretations" departed from the actual mathematical objects under consideration — functions are firmly partial-by-default in my mind's eye.

It's important to hold firm the distinction between "model" and "theory" here. Remember that the number

theoryis incomplete, while the standardmodelof number theory is the one that picks "true" for all Gödel sentences, has no infinite numbers, etc. (The difficulties in pinpointing such a model is exactly what the incompleteness theorem is all about.)Be aware that the mathematical definition of a model may not match your intuitive idea of "a structure which interprets a theory",

especiallyif you're coming from computer science (or other constructive fields).None of this is particularly novel. Rather, this is a collection of distinctions and clarifications that would have made my life a bit easier when beginning the textbook.

In my case, I didn't have any of these concepts wrong, per se — rather, I had them fuzzy. The above distinctions were not yet fleshed out in my mind. This post provides a context for model theory; a taste of the type of thinking you must be ready to think.

I was originally going to use this as context for what I've learned in model theory so far, but this post took longer than expected. I'll follow up tomorrow.