Epistemic status: Vaguely confused and probably lacking a sufficient technical background to get all the terms right. Is very cool though, so I figured I'd write this.

And what are these Fluxions? The Velocities of evanescent Increments? And what are these same evanescent Increments? They are neither finite Quantities nor Quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?

George Berkeley, The Analyst

When calculus was invented, it didn't make sense. Newton and Leibniz played fast and dirty with mathematical rigor to develop methods that arrived at the correct answers, but no one knew why. It took another one and a half centuries for Cauchy and Weierstrass develop analysis, and in the meantime people like Berkeley refused to accept the methods utilizing these "ghosts of departed quantities."

Cauchy's and Weierstrass's solution to the crisis of calculus was to define infinitesimals in terms of limits. In other words, to not describe the behavior of functions directly acting on infinitesimals, but rather to frame the the entire endeavour as studying the behaviors of certain operations in the limit, in that weird superposition of being arbitrarily close to something yet not it.

(And here I realize that math is better shown, not told)

The limit of a function at  is  if for any  there exists some  such that if 

 then 

Essentially, the limit exists if there's some value  that forces  to be within  of  if  is within  of . Note that this has to hold true for all , and you choose  first!

From this we get the well-known definition of the derivative: 

 and you can define the integral similarly.

The limit solved calculus's rigor problem. From the limit the entire field of analysis was invented and placed on solid ground, and this foundation has stood to this day.

Yet, it seems like we lose something important when we replace the idea of the "infinitesimally small" with the "arbitrarily close to." Could we actually make numbers that were infinitely small?

The Sequence Construction

Imagine some mathematical object that had all the relevant properties of the real numbers (addition, multiplication are associative and commutative, is closed, etc.) but had infinitely small and infinitely large numbers. What does this object look like?

We can take the set of all infinite sequences of real numbers  as a starting point. A typical element  would be 

 where  is some infinite sequence of real numbers.

We can define addition and multiplication element-wise as: 

 

You can verify that this is a commutative ring, which means that these operations behave nicely. Yet, being a commutative ring is not the same thing as being an ordered field, which is what we eventually want if our desired object is to have the same properties as the reals.

To get from  to a field structure, we have to modify it to accommodate well-defined division. The typical way of doing this is looking at how to introduce the zero product property: i.e. ensuring that if  then if  either one of  is .

If we let  be the sequence of all zeros  in  then it is clear that we can have two non-zero elements multiply to get zero. If we have 

 and 

 then neither of these are the zero element, yet their product is zero.

How do we fix this? Equivalence classes!

Our problem is that there are too many distinct "zero-like" things in the ring of real numbered sequences. Intuitively, we should expect the sequence  to be basically zero, and we want to find a good condensation of  that allows for this.

In other words, how do we make all the sequences with "almost all" their elements as zero to be equal to zero?

Almost All Agreement ft. Ultrafilters

Taken from "five ways to say "Almost Always" and actually mean it":

A filter  on an arbitrary set  is a collection of subsets of  that is closed under set intersections and supersets. (Note that this means that the smallest filter on  is  itself).

An ultrafilter is a filter which, for every , contains either  or its complement. A principal ultrafilter contains a finite set.

A nonprincipal ultrafilter does not.

This turns out to be an incredibly powerful mathematical tool, and can be used to generalize the concept of "almost all" to esoteric mathematical objects that might not have well-defined or intuitive properties.

Let's say we define some nonprincipal ultrafilter  on the natural numbers. This will contain all cofinite sets, and will exclude all finite sets. Now, let's take two sequences  and define their agreement set  to be the indices on which  are identical (have the same real number in the same position).

Observe that  is a set of natural numbers. If  then  cannot be finite, and it seems pretty obvious that almost all the elements in  are the same (they only disagree at a finite number of places after all). Conversely, if  this implies that , which means that  disagree at almost all positions, so they probably shouldn't be equal.

Voila! We have a suitable definition of "almost all agreement": if the agreement set  is contained in some arbitrary nonprincipal ultrafilter .

Let  be the quotient set of  under this equivalence relation (essentially, the set of all distinct equivalence classes of ). Does this satisfy the zero product property?

(Notation note: we will let  denote the infinite sequence of the real number , and  the equivalence class of the sequence  in .)

Yes, This Behaves Like The Real Numbers

Let  such that . Let's break this down element-wise: either  must be zero for all  As one of the ultrafilter axioms is that it must contain a set or its complement, either the index set of the zero elements in  or the index set of the zero elements in  will be in any nonprincipal ultrafilter on  Therefore, either  or  is equivalent to  in  so  satisfies the zero product property.

Therefore, division is well defined on ! Now all we need is an ordering, and luckily almost all agreement saves the day again. We can say for  that  if almost all elements in  are greater than the elements in  at the same positions (using the same ultrafilter equivalence).

So,  is an ordered field!

Infinitesimals and Infinitely Large Numbers

We have the following hyperreal:

Recall that we embed the real numbers into the hyperreals by assigning every real number  to the equivalence class . Now observe that  is smaller than every real number embedded into the hyperreals this way.

Pick some arbitrary real number . There exists  such that . There are infinitely many fractions of the form , where  is a natural number greater than , so  is smaller than  at almost all positions, so it is smaller than .

This is an infinitesimal! This is a rigorously defined, coherently defined, infinitesimal number smaller than all real numbers! In a number system which shares all of the important properties of the real numbers! (except the Archimedean one, as we will shortly see, but that doesn't really matter).

Consider the following

By a similar argument this is larger than all possible real numbers. I encourage you to try to prove this for yourself!

(The Archimedean principle is that which guarantees that if you have any two real numbers, you can multiply the smaller by some natural number to become greater than the other. This is not true in the hyperreals. Why? (Hint:  breaks this if you consider a real number.))

How does this tie into calculus, exactly?

Well, we have a coherent way of defining infinitesimals!

The short answer is that we can define the star operator (also called the standard part operator)  as that which maps any hyperreal to its closest real counterpart. Then, the definition of a derivative becomes 

 where  is some infinitesimal, and  is the natural extension of  to the hyperreals. More on this in a future blog post!

It also turns out the hyperreals have a bunch of really cool applications in fields far removed from analysis. Check out my expository paper on the intersection of nonstandard analysis and Ramsey theory for an example!

Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.

New to LessWrong?

New Comment
28 comments, sorted by Click to highlight new comments since: Today at 4:26 AM

Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.

Different people will have different intuitions. I've always found the epsilon-delta method clear and simple, and infinitesimals made of shadows and fog when used as a basis for calculus. Every infinitesimals-first approach I have seen involves unexplained magic or papered-over cracks at some point, unexplained and papered-over because at the stage of first learning calculus the student usually doesn't know any formal logic. There's a reason that infinitesimals were only put on a sound footing a century after epsilon-delta. Mathematical logic had to be invented first.

Here the magic lies in depending on the axiom of choice to get a non-principal ultrafilter. And I believe I see a crack in the above definition of the derivative. is a function on the non-standard reals, but its derivative is defined to only take standard values, so it will be constant in the infinitesimal range around any standard real. If , then its derivative should surely be everywhere. The above definition only gives you that for standard values of .

I also think that making it more intuitive is missing the point of learning—really learning—mathematics. The idea of the slope of a curve is already intuitive. What is needed is to show the student a way of thinking about these things that does not depend on the breath of intuition to keep it aloft.

Here the magic lies in depending on the axiom of choice to get a non-principal ultrafilter. And I believe I see a crack in the above definition of the derivative. f is a function on the non-standard reals, but its derivative is defined to only take standard values, so it will be constant in the infinitesimal range around any standard real. If , then its derivative should surely be  everywhere. The above definition only gives you that for standard values of .

Yep, the definition is wrong. If  then let  denote the natural extension of this function to the hyperreals (considering  behaves like  this should work in most cases). Then, I think the derivative should be

W.r.t. what the derivative of  should be, I imagine you can describe it similarly in terms of  , which by the transfer principle should exist (which applies because of Łoś's theorem, which I don't claim to fully understand).

For the derivative then is:

Just in case anyone was wondering why we can't have any finite sets in the ultrafilter:

If some finite set {n1, n2, ..., n_k} is in an ultrafilter U, then either {n1, n2, ..., n_(k-1)} is in U or I \ {n1, n2, ..., n_(k-1)} is in U. In the latter case, the intersection with the original set is {n_k}, which must be in U. In the former case, you can keep repeating this until you are left with some other one-element set.

If any one-element set {n} is in U, then membership in U is just decided by whether a set contains n or not.

When you go through the equivalence construction, this means that two sequences are equivalent if and only if they agree at the n'th position, which means that all the operations are just the same as arithmetic on that position with the rest not mattering at all. So to get anything different, U really does have to be a non-principal ultrafilter.

Observe that  is a set of natural numbers. If  then  cannot be finite, and it seems pretty obvious that almost all the elements in  are the same (they only disagree at a finite number of places after all). 

The bracketed remark doesn't appear to be true. Why can we not have  or ? Indeed, by the definition of an ultrafilter, we must have one of them in . Also, in the post, you use  for two different purposes, which makes the post slightly less clear.

Some random thoughts. 

First, it would be nice if one could go from rationals to hyperreals directly without having to define the reals in between (especially for people with limit allergies, as the reals are sometimes defined as limits of Cauchy sequences). I don't see a straightforward way to do so though, you can hardly allow people to encode their reals as sequences of rationals, otherwise the sequence would have to be equivalent to zero instead of an infinitesimal. 

Also, one could split the hyperreals into equivalence classes within which the Archimedian property holds. Using the big-O adjacent notation, the reals would be , and the hyperreal called  above would be . Stretching the big-O notation, one could call the equivalence class of  something like . So one has a rather large zoo of these equivalence classes. This would imply that there is no Archimedian equivalence class for the smallest infinite hyperreal. If a hyperreal  is infinite (that is,  diverges), then  is a smaller infinite hyperreal. 

I am well used to there being no biggest infinity, but there being no smallest infinity would indicate that these things are neither equivalent to cardinals nor ordinals. 

I found Terry Tao's writing on the topic to be helpful for understanding, especially the connection between nonprincipal ultrafilters and Arrow's Impossibility Theorem.

Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.

I think hyperreals are too complicated for calculus 1 and you should just talk about a non-rigorous "infinitesimal" like Newton and Leibniz did.

I agree. This is what I was going for in that paragraph. If you define derivatives & integrals with infinitesimals, then you can actually do things like treating dy/dx as a fraction without partaking in the half-in half-out dance that calc 1 teachers currently have to do.

I don't think the pedagogical benefit of nonstandard analysis is to replace Analysis I courses, but rather to give a rigorous backing to doing algebra with infinitesimals ("an infinitely small thing plus a real number is the same real number, an infinitely small thing times a real number is zero"). *Improper integrals would make a lot more sense this way, IMO.

Thank you, that makes sense!

Indefinite integrals would make a lot more sense this way, IMO

Why so? I thought they already made sense, they're "antiderivatives", so a function such that taking its derivative gives you the original functions. Do you need anything further to define them?

(I know about the definite integral Riemann and Lebesgue definitions, but I thought indefinite integrals were much easier in comparison.

Language mix-up. Meant improper integrals.

Now that I'm thinking about it, my memory's fuzzy on how you'd actually calculate them rigorously w/infinitesimals. Will get back to you with an example.

[+][comment deleted]6mo10

Voila! We have a suitable definition of "almost all agreement": if the agreement set is contained in some arbitrary nonprincipal ultrafilter .

Isn't it easier to just say "If the agreement set has a nonfinite number of elements"? Why the extra complexity?

must contain a set or its complement

Oh I see, so defining it with ultrafilters rules out situations like and where both have infinite zeros and yet their product is zero.

The post is wrong in saying that U contains only cofinite sets. It obviously must contain plenty of sets that are neither finite nor cofinite, because the complements of those sets are also neither finite nor cofinite. Possibly the author intended to type "contains all cofinite sets" instead.

In particular, exactly one of a or b is equivalent to zero in *R.

Which one is equivalent to zero depends upon exactly which non-principal ultrafilter you choose, as there are infinitely many non-principal ultrafilters. Unfortunately (as with many other applications of the Axiom of Choice) there is no finite way to specify which ultrafilter you mean.

The post is wrong in saying that U contains only cofinite sets. It obviously must contain plenty of sets that are neither finite nor cofinite, because the complements of those sets are also neither finite nor cofinite. Possibly the author intended to type "contains all cofinite sets" instead.

Yep, this is correct! I've updated the post to reflect this.

E.g. if an ultrafilter contains the set of all even naturals, it won't contain the set of all odd naturals, neither of which are finite or cofinite. 

Thanks, this is helpful to point out.

Of course, this makes all of this rather abstract. It looks to me like for almost any two hyperreals (e.g. a, b as above), the answer to "which of them is larger?" is "It depends on the ultrafilter. Also, I can not tell you if a set is part of any specific ultrafilter. But fear not, for any given ultrafilter, the hyperreals are well-ordered."

Basically for any usable theorem, one would have to prove that the result is independent of the actual ultrafilter used, which means that numbers such as a and b will probably not feature in them a lot. 

I can not fault my analysis 1 professor for opting to stick to the reals (abstract as they are already are) instead. 

I don't understand some of the words you used, so please correct me if I am wrong. What are the equivalents of the original natural numbers here? Is it like 2 = { (2, 2, 2...), and all sequences that contain an infinite number of 2's and a finite number of anything else } ?

Then we would have a partially ordered set, because 2 is neither greater than nor smaller than { (1, 3, 1, 3, 1, 3...), and its equivalents }. Is that okay?

Yes. We have 2=[(2,2,2,...)]. But we can compare 2 with (1,3,1,3,1,3,...) since (1,3,1,3,1,3,1,3,...)=1 (this happens when the set of all even natural numbers is in your ultrafilter) or (1,3,1,3,1,3,1,3,...)=3 (this happens when the set of all odd natural numbers is in your ultrafilter). Your partially ordered set is actually a linear ordering because whenever we have two sequences , one of the sets

 is in your ultrafilter (you can think of an ultrafilter as a thing that selects one block out of every partition of the natural numbers into finitely many pieces), and if your ultrafilter contains

, then .

Thank you for this. it looks like a good first contact with hyperreals.

Two nitpicks:

  • Ω=(1,2,3,ldots). --> I think you forgot a "\" here and it is messing your formatting up.
  • It is not clear in the post why we use a hyperfilter, rather than just the set of all infinite sets.

Furthermore after

Conversely, if I∉U,this implies that the complement of I

the slash is used for the setminus operation. I think using \setminus there (which generates a backslash) would be a more standard notation less likely to be mistaken for quotient structures. 

I'm familiar with \setminus being used to denote set complements, so \not\in seemed more appropriate to me ( is not an element of ). I interpret  as "the elements of  not in ," which is the empty set in this case? (also the elements of are sets of naturals while the elements of are naturals, so it's unclear to me how much this makes sense)

Sorry, I was quoting the only parts of the sentence. 

What I meant was that I would change

Conversely, if I∉U, this implies that N/I∈U, which means that a, b disagree at almost all positions, so they probably shouldn't be equal.

to

Conversely, if I∉U, this implies that NI∈U, which means that a, b disagree at almost all positions, so they probably shouldn't be equal.

I have heard of filters and ultrafilters, but I have never heard of anyone calling any sort of filter a hyperfilter. Perhaps it is because the ultrafilters are used to make fields of hyperreal numbers, so we can blame this on the terminology. Similarly, the uniform spaces where the hyperspace is complete are called supercomplete instead of hypercomplete.

But the reason why we need to use a filter instead of a collection of sets is that we need to obtain an equivalence relation.

Suppose that  is an index set and  is a set with  for . Then let  be a collection of subsets of . Define a relation  on  by setting    if and only if . Then in order for  to be an equivalence relation,  must be reflexive, symmetric, and transitive. Observe that  is always symmetric, and  is reflexive precisely when .

Proposition: The relation  is transitive if and only if  is a filter.

Proof:

 Suppose that  is a filter. Then whenever , we have

, so since

, we conclude that   as well. Therefore, .

. Suppose now that . Then let let  where  denotes the characteristic function. Then  and . Therefore,, so by transitivity,  as well, hence .

Suppose now that  and . Let  and set .

Observe that  and . Therefore, . Thus, by transitivity, we know that . Therefore, . We conclude that  is closed under taking supersets. Therefore,  is a filter.

Q.E.D.

I have heard of filters and ultrafilters, but I have never heard of anyone calling any sort of filter a hyperfilter.

Oops, my bad. I re-read the post as I was typing to make sure I hadn't missed any explanation. That can sometimes cause me to type what I read instead of what I intended. I probably interverted the prefixes because they feel similar.

Thank you for the math. I am not sure everything is right with your notations in the second half, it seems to me there must be a typo either for the intersection case or the superset one. But the ideas are clear enough to let me complete the proof.

The definition of a derivative seems wrong. For example, suppose that for rational but for irrational . Then is not differentiable anywhere, but according to your definition it would have a derivative of 0 everywhere (since could be an infinitesimal consisting of a sequence of only rational numbers).

Have updated the definition of the derivative to specify the differences between over the hyperreals and over the reals.

I think the natural way to extend your to the hyperreals is for it to take values in an infinitesimal neighborhood surrounding rationals to 0 and all other values to 1. Using this, the derivative is in fact undefined, as

First, I don't think it's a good idea to have to rely on the axiom of choice in order to be able to define continuity.

Now, from my point of view, saying that continuity is defined in terms of limits is the wrong way to look at it. Continuity is a property relative to the topology of your space. If you define continuity in terms of open sets, I find that not only the definition does make sense, but also it extends in general to any topological space. But I kind of understand that not everyone will find this intuitive.

Also, I believe that your definitions that replace the limits in terms of hyperreals have to take into account all possible infinitesimals, and thus I don't understand how it's really any different that the sequential characterization of limits. But maybe I'm missing something.

Let \(X,Y\) be topological spaces. Then a function \(f:X\rightarrow Y\) is continuous if and only if whenever \((x_d)_{d\in D}\) is a net that converges to the point \(x\), the net \((f(x_d))_{d\in D}\) also converges to the point \(f(x)\). This is not very hard to prove. This means that we do not have to discuss as to whether continuity should be defined in terms of open sets instead of limits because both notions apply to all topological spaces. If anything, one should define continuity in terms of closed sets instead of open sets since closed generalize slightly better to objects known as closure systems (which are like topological spaces, but we do not require the union of two closed sets to be closed). For example, the collection of all subgroups of a group is a closure system, but the complements of the subgroups of a group have little importance, so if we want the definition that makes sense in the most general context, closed sets behave better than open sets. And as a bonus, the definition of continuity works well when we are taking the inverse image of closed sets and when we are taking the closure of the image of a set.

With that being said, the good thing about continuity is that it has enough characterizations so that at least one of these characterizations is satisfying (and general topology texts should give all of these characterizations even in the context of closure systems so that the reader can obtain such satisfaction with the characterization of his or her choosing).

Yet, the biggest effect I think this will have is pedadogical. I've always found the definition of a limit kind of unintuitive, and it was specifically invented to add post hoc coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.

Uhm, hyperreals really look like packaged limits, I don't expect understanding them is easier than understanding limits.