A Kernel of Truth: Insights from 'A Friendly Approach to Functional Analysis'

TurnTrout

Foreword

What is functional analysis? A satisfactory answer requires going back to where it all started.

"All are present; the meeting convenes," intoned Fredholm. Intent were the gathered faces, their thoughts fixed on their students. "What do we know of their weaknesses?"

Hilbert leaned back, torch's light flickering across his features. "Lots of dimensions, especially when they need to find the Hessian. What if… what if we made them deal with infinitely many dimensions?"...

It was Banach who finally spoke. "David, they already know about the vector space for the polynomials".

Hilbert smirked. "Who said anything about countably infinite?". More silence, then glances, then grins.

It was Riesz's voice which next broke the silence. "And we can make them do analysis in that space. And linear algebra, but not the easy parts. Of course, they'll need to also deal with complex numbers. Sprinkle a little topology and abstract algebra on top, because... they deserve –"

"Frigyes, some of them might actually be able to do that. We need more." After a pause, Fredholm continued: "We'll tell them that they only need to know basic calculus."

A Friendly Approach to Functional Analysis

I didn't actually find the book overly hard (it took me seven days to complete, which is how long it took for my first book, Naïve Set Theory), although there were some parts I skipped due to unclear exposition. it's actually one of my favorite books I've read in a while – it's for sure my favorite since the last one. That said, I'm very glad I didn't attempt this early in my book-reading journey.

My brain won't stop line to me

Some part of me insisted that the left-shift mapping

$(x_{1}, x_{2}, \dots) \mapsto (x_{2}, x_{3}, \dots) : ℓ^{\infty} \to ℓ^{\infty}$

is "non-linear" because it incinerates $x_{1}$ ! But wait, brain, this totally is linear, and it's also continuous with respect to the ambient supremum norm!

Formally, a map $T$ is linear when $T (α x + β y) = α T (x) + β T (y)$ .

Informally, linearity is about being able to split a problem into small parts which can be solved individually. It doesn't have to "look like a line", or something. In fact, lines^[1] $y = m x$ are linear because putting in $Δ x$ more $x$ gets you $m \cdot Δ x$ more $y$ !

Linearity and continuity

Two things surprised me.

First, a(n infinite-dimensional) linear function can be discontinuous. (?!)

Second, a linear function $T$ is continuous if and only if it is bounded; that is, there is an $M > 0$ such that $\forall x, x_{0} : | | T (x - x_{0}) | | \leq M | | x - x_{0} | |$ .

The if is easy: this is just Lipschitz continuity, which obviously implies normal continuity.
The other direction follows because the continuity implies that for $ϵ := 1$ , we can bound how much it's expanding the volume of some $δ$ -ball and then apply linearity.

What the hell are functional derivatives?

Derivatives tell you how quickly a function is changing in each input dimension. In single-variable calculus, the derivative of a function $f : R \to R$ is a function $f^{'} : R \to R$ .

In multi-variable calculus, the derivative of a function $g : R^{n} \to R$ is a function $g^{'} : R^{n} \to R^{n}$ – for a given $n$ -dimensional input vector, the real-valued output of $g$ can change differently depending on in which input dimension change occurs.

You can go even further and consider the derivative of $h : R^{n} \to R^{m}$ , which is the function $h^{'} : R^{n} \to R^{n \times m}$ – for a given $n$ -dimensional input vector, $h$ again can change its vector-valued output differently depending on in which input dimension change occurs.

But what if we want to differentiate the following function, with domain $C [a, b]$ and range $R$ :

$L (f) := \int_{0}^{1} (f (t))^{2} d t .$

How do you differentiate with respect to a function? I'm going to claim that

$L_{f}^{'} (g) = \int_{0}^{1} 2 f (t) g (t) d t .$

It's not clear why this is true, or what it even means. Here's an intuition: at any given point, there are uncountably many partial derivatives in the function space $C [a, b]$ – there are many, many "directions" in which we could "push" a function $f$ around. $L_{f}^{'} (g)$ gives us the partial derivative at $f$ with respect to $g$ .

This concept is important because it's what you use to prove e.g. that a line is the shortest continuous path between two points.

Below is an exchange between me (in plain text) and TheMajor (quoted text), reproduced and slightly edited with permission.

I'm having trouble understanding functional derivatives. I'm used to thinking about derivatives as with respect to time, or with respect to variations along the input dimensions. But when I think about a derivative on function space, I'm not sure what the "time" is, even though I can think about the topology and the neighborhoods around a given function.

And I know the answer is that there isn't "time", but I'm not sure what there is.

An interesting concept that comes to mind is thinking about a functional derivative with respect to e.g. a straight-line homotopy, where you really could say how a function is changing at every point with respect to time. But I don't think that's the same concept.

The concept is as follows:

Let's say we have some (a priori non-linear) map $L$ , which takes a function as an input and gives a number as an output. I.e. it maps from a vector space $X$ of functions to the complex numbers $C$ . Now fix a function $f \in X$ , and a second function $g \in X$ . We can then consider the 1-dimensional linear subspace $f + C g := {f + λ g : λ \in C}$ . The map $L$ on this subspace is just a normal map, and if it is differentiable at the point $f$ in this subspace then its derivative is called the functional derivative of $L$ at $f$ with respect to $g$ .

By normal map, is that something like a normal operator?

sorry, I didn't mean normal in a technical context. Since the subspace I introduced is one-dimensional (as a complex vector space), and it maps to the complex numbers as well, we have good old introduction to complex analysis derivatives here. If you like you can work with reals instead of complex variables too, in which case it would be the familiar real derivative.

Wouldn't it still output a function, $g^{'}$ maybe? wait. Would the derivative wrt $λ$ just be $g$ ?

there is no derivative with respect to $λ$ .

ah ya. duh (ETA: my brain was still acting as if differentiation had to be from the real numbers to the real numbers, so it searched for a real/complex number in the problem formalization and found $λ$ .)

let me know if this part is clear, because unfortunately its the next few steps where it gets really confusing.

Unfortunately, I don't think it's clear yet. So I see how this is a one-dimensional subspace,^[2] because it's generated by one basis function ( $g$ ).

But I don't see how this translates to a normal complex derivative, in particular, I don't quite understand what the range of this function is.

No problem, and it's very good that you share that it's unclear. The range of $L$ is the complex numbers, $L$ maps from $X$ (our vector space of functions) to $C$ (the complex numbers).

I guess I'm confused why we're using that type signature if we're taking a derivative on the whole function – but maybe that'll be clear after I get the rest.

that is exactly the heart of the confusion surrounding functional derivatives, and we'll have to get there in a few steps.we'll start with defining functional derivatives for easy maps, i.e. the ones that take on complex values, and then work towards more complicated settings.

so back to the example above; we have a vector space $X$ (our 'function space'), we have a (possibly non-linear) map $L : X \to C$ . we will now introduce the derivative of $L$ at $f$ with respect to $g$ , with $f, g \in X$ . This derivative is just a complex number.

To find this we consider the 1-dimensional subspace $f + C g$ that I introduced above, and we note that the map from $C$ to this subspace, given by $λ \mapsto f + λ g$ , is a bijection that goes through $f$ at 0. this gives us a map from $C$ to $C$ , by sending $λ$ to $L (f + λ g)$ . We take the derivative of that at $λ = 0$ , and that is the derivative of $L$ at $f$ with respect to $g$ .

Okay, that makes sense so far.

Nice 😃 this map has a few properties that I just want to remark and then ignore. For example it need not be linear in $f$ (which makes sense, since $f$ is only the point we're evaluating at). And by doing some work with chain rules it does have some linear properties in $g$ .

now there are two ways in which we can make this story complicated again, and most authors do both simultaneously.

Firstly we can try to extend the "derivative of $L$ at $f$ wrt $g$ " to something like "derivative of $L$ at $f$ ". We'll do this first. Secondly we can try to take a different map, say $M$ , which maps from $X$ into another vector space $Y$ (instead of the complex numbers). We can then try and define a derivative of $M$ at $f$ wrt $g$ .

The first step is conceptually simple, but formally and computationally very difficult. Given a point $f \in X$ and our map $L$ from before, we can simply say that "the derivative of $L$ at $f$ " is the map that sends $g \in X$ to "the derivative of $L$ at $f$ with respect to $g$ ". So "the derivative of $L$ at $f$ " is a map from $X$ to $C$ .

this is formally difficult because usually you want this derivative to have some nice properties, but because it was defined pointwise it's very difficult to establish this! Frequently these derivatives are not continuous, and mathematicians resort to horrible tricks (like throwing out a bunch of points of the domain X on which our derivative is annoying) to recover some structure here.

So, given some arbitrary function $L : X \to C$ which is "differentiable" at $f$ , we define a function $L_{f}^{'} : g \mapsto$ (derivative of $L$ at $f$ with respect to $g$ )?

yes, exactly.

You could even maybe think of each input $g$ as projecting the derivative of $L$ at $f$ ? Or specifying one of many possible directions.

Yes, this is 100% correct. This is related to the "nice linear properties in $g$ " that I mentioned above

I also stated that this is computationally difficult. This is actually quite funny - the best way to find "The derivative of $L$ at $f$ " is to take a 'test function' $g \in X$ (arbitrarily), compute (the derivative of $L$ at $f$ with respect to $g$ ), and then tahdah, you have now found the map that sends $g$ to (the derivative of $L$ at $f$ wrt $g$ ), i.e. exactly what you were looking for.

this sounds pretty computationally easy? Or are you calculating $L^{'}$ for a general test function $g$ , in which case, how do you get any nontrivial information out of that?

Yes, you need to calculate it for a general test function.

also something that may help with gaining insight: in multivariable calculus (lets say 2 dimensions, that's already plenty difficult) there is a clear divide between the [existence of a partial derivative of a function at a point] and [the function being differentiable at that point].

ETA: Back in my Topology review, I discussed a similar phenomenon: continuity in multiple input dimensions requires not just continuity in each input variable, but in all sequences converging to the point in question:

"Continuity in the variables says that paths along the axes converge in the right way. But for continuity overall, we need all paths to converge in the right way. Directional continuity when the domain is $R$ is a special case of this: continuity from below and from above if and only if continuity for all sequences converging topologically to $x$ ."

Similarly, for a function to be differentiable, the existence of all of its partial derivatives isn't enough – you need derivatives for every possible approach to the point in question. Here, the existence of all of the partials automatically guarantees the derivatives for every possible approach, because there's a partial for every function.

here we have the same, except we have (in an infinite-dimenional function space X) infinitely many 'partial derivatives'. so from that point of view it's not that surprising that a function "having a derivative at $f$ " is actually quite rare/complicated.

yeah, because $L^{'}$ has to exist for… all $g$ ? That seems a little tough.

It exists for all $g$ , and then $L_{f}^{'}$ exists as a formal map. But usually you want something stronger, for example that $L_{f}^{'} : X \to C$ is continuous.

as an important but relatively trivial aside: if $L$ is a linear map, then $L_{f}^{'}$ does not actually depend on $f$ . So usually it is just called "the derivative of $L$ " instead of "the derivative of $L$ at $f$ ". This is confusing, because for non-linear $L$ there is also something called "the derivative of $L$ ", namely "the map that sends $f$ to [the derivative of $L$ at $f$ ]".

hm. That's because of the definition of linearity, right? it's a homomorphism for both the operations of addition and scalar multiplication... Wait, I intuitively understand why linearity means it's the same everywhere, but I'm having trouble coming up with the formal justification…

Yes, the point is that when we look at the definition of "derivative of $L$ at $f$ wrt $g$ " that is given by ${lim}_{λ \to 0} \frac{L (f + λ g) - L (f)}{λ}$ ...

ah, got it!

ok, so this was all the first way to make it confusing again. Ready for the second?

I'm ready to be reconfused.

Ok, so now let's pick a range not inside the complex numbers $C$ , but inside a second normed vector space $Y$ . So we have a map $M : X \to Y$ , not necessarily linear. Again fix points $f, g \in X$ . We are going to define the derivative of $M$ at $f$ wrt $g$ .

so we repeat our trick from before, consider the map from $C$ via $X$ to $Y$ given by $λ \mapsto M (f + λ g)$ . We wish to differentiate it at $λ = 0$ .

unfortunately, its image is now in $Y$ , not in $C$ , so we don't really know what the derivative means. But because $Y$ is a normed vector space, the expression $\frac{M (f + λ g) - M (f)}{λ}$ makes sense for all non-zero $λ$ .

if this function can be continuously extended to $λ = 0$ then we define its image at 0 as the derivative of $M$ at $f$ wrt $g$ . Note that this notion of continuity has to do with the norm of $Y$ .

this is now a vector in $Y$ , so if this works we have: [the derivative of $M$ at $f$ wrt $g$ ] which is an element of $Y$ , [the derivative of $M$ at $f$ ] which is a (linear! usually horrible and not continous!) map from $X$ to $Y$ .

btw if the "continuously extending" part is new, you can also just think of it as the limit of that fraction as $λ$ approaches 0. The only point is that (as long as we're working with complex vector spaces) there are a lot of different ways for $λ$ to approach 0, and it has to work for all of them.

if we're working over the reals its simply the notion of "right limit" and "left limit" (the only two ways to approach 0 in $R$ ) that you may have seen before, except that the convergence is now happening in $Y$ .

Other notes

The operator norm is really cool.
Linear combinations always involve finitely many terms, but using the orthonormal basis of an infinite dimensional space, you can take the limit as $n \to \infty$ .
I was really happy to see watered-down versions of symmetry/conservation law correspondences (aka Noether's theorem). Can't wait to learn the real version.

Final thoughts

The book is pretty nice overall, with some glaring road bumps – apparently, the Euler-Lagrange equation is one of the most important equations of all time, and Sasane barely spends any effort explaining it to the reader!

And if I didn't have the help of TheMajor, I wouldn't have understood the functional derivative, which, in my opinion, was the profoundly important insight I got from this book. My models of function space structure feel qualitatively improved. I can look at a Fourier transform and see what it's doing – I can feel it, to an extent. Without a doubt, that single insight makes it all worth it.

Forward

I'm probably going to finish up an epidemiology textbook, before moving on to complex analysis, microeconomics, or... something else – who knows! If you're interested in taking advantage of quarantine to do some reading, feel free to reach out and maybe we can work through something together. 🙂

Lines $y = m x + b$ ( $b \neq 0$ ) aren't actually linear functions, because they don't go through the origin. Instead, they're affine. ↩︎
To be more specific, $f + C g := {f + λ g : λ \in C}$ is often an affine subspace, because the zero function is not necessarily a member. ↩︎

[-]TheMajor5y40

Very nice! Two mistakes though:

Technically the introductory part on derivatives on $R^{n}$ is incorrect, in two different ways.

Firstly the derivative of a map $f : R^{n} \to R$ is a map $f^{'} : R^{n} \to L (R^{n}, R)$ , that assigns to every point x a linear map sending direction y to a real value (namely the partial derivative of f at x in direction y). Thankfully the space of linear maps from $R^{n}$ to $R$ is isometrically isomorphic to $R^{n}$ through the inner product, recovering the expression you gave. Similarly the derivative of a map $f : R^{n} \to R^{m}$ is a map $f^{'} : R^{n} \to L (R^{n}, R^{m})$ .
Secondly technically the domain of any derivative like the one above is not the vector space we are working with, but the set of directions at point x. This notion is formalised in Manifold theory and called the tangent space. Thankfully for any finite-dimenional vector space the tangent space at any point is canonically isomorphic to the vector space itself (any vector is a direction, that's what they were invented for). In infinite dimensions this still holds just fine except for the small detail that the notions of manifold and tangent space don't exist there. The same distinction is necessary in the range. So truly, formally, the derivative of a map $f : R^{n} \to R$ is a map $f^{'} : T R^{n} \to T R$ , with $T R^{n} = ⋃_{x \in R^{n}} T_{x} R^{n} \equiv R^{n} \times R^{n}$ and similarly $T R = ⋃_{y \in R} T_{y} R \equiv R \times R$ , with the condition that $f^{'}$ is simply $f$ on the first coordinate. This coincides with the map above: for every $x \in R^{n}$ we get a linear map $f^{'} (x) : T_{x} R^{n} \to T_{f (x)} R .$
The above may seem very confusing for $n = 1$ , since I claim that the derivative in that case is a map $f^{'} : R \to L (R, R)$ instead of simply a real-valued function. This is resolved by noting that each linear map from $R$ to $R$ can be represented with a number, similar to the top bullet point above (the inner product on $R$ is just multiplication). I think lecturers are quite justified in not exploring the details of this when first introducing derivatives or partial derivatives, but unfortunately in possibly infinite-dimensional abstract vector spaces the distinctions are necessary, if only to avoid type errors.

In the definition of the partial derivative of M at f with respect to g (so with a range inside a vector space Y) we do not take the norm or absolute value of that expression, it should be the straight up limit ${lim}_{λ \to 0} \frac{M (f + λ g) - M (f)}{λ}$ . The claim that the limit exists does depend on the topology of Y and therefore on the norm, though.

Also there are a lot of discontinuous linear maps out there. A textbook example is considering the vector space $P [0, 1]$ of polynomials interpreted as functions on the closed interval $[0, 1]$ , equipped with supremum norm. The derivative map $\frac{d}{d x} : P [0, 1] \to P [0, 1]$ is not continuous, and you can verify this directly by searching for a sequence of functions that converges to 0 whose image does not converge to 0.

[-]johnswentworth5y40

Probably too late at this point for you, but in case other people come along... I'd recommend learning functional analysis first in the context of a theoretical mechanics course/textbook, rather than a math course/textbook. The physicists tend to do a better job explaining the intuitions (and give far more exposure to applications), which I find is the most important thing for a first exposure. Full rigorous detail is something you can pick up later, if and when you need it.

[-]TheMajor5y10

Personally I did the exact opposite, and found that very refreshing. Whenever I ran into a snippet of applied functional analysis without knowing the formal background it just confused me.

LESSWRONG
LW

32