A Kernel of Truth: Insights from 'A Friendly Approach to Functional Analysis'

4th Apr 2020

4TheMajor

4johnswentworth

1TheMajor

New Comment

3 comments, sorted by Click to highlight new comments since: Today at 8:34 AM

Very nice! Two mistakes though:

- Technically the introductory part on derivatives on is incorrect, in two different ways.
- Firstly the derivative of a map is a map , that assigns to every point
*x*a linear map sending direction*y*to a real value (namely the partial derivative of f at x in direction y). Thankfully the space of linear maps from to is isometrically isomorphic to through the inner product, recovering the expression you gave. Similarly the derivative of a map is a map . - Secondly
*technically*the domain of any derivative like the one above is not the vector space we are working with, but the*set of directions at point x.*This notion is formalised in Manifold theory and called the*tangent space*. Thankfully for any finite-dimenional vector space the tangent space at any point is canonically isomorphic to the vector space itself (any vector*is*a direction, that's what they were invented for). In infinite dimensions this still holds just fine except for the small detail that the notions of manifold and tangent space don't exist there. The same distinction is necessary in the range. So truly, formally, the derivative of a map is a map , with and similarly , with the condition that is simply on the first coordinate. This coincides with the map above: for every we get a linear map - The above may seem very confusing for , since I claim that the derivative in that case is a map instead of simply a real-valued function. This is resolved by noting that each linear map from to can be represented with a number, similar to the top bullet point above (the inner product on is just multiplication). I think lecturers are quite justified in not exploring the details of this when first introducing derivatives or partial derivatives, but unfortunately in possibly infinite-dimensional abstract vector spaces the distinctions are necessary, if only to avoid type errors.
- In the definition of
*the partial derivative of M at f with respect to g*(so with a range inside a vector space*Y*) we do not take the norm or absolute value of that expression, it should be the straight up limit . The claim that the limit exists does depend on the topology of*Y*and therefore on the norm, though.

Also there are a lot of discontinuous linear maps out there. A textbook example is considering the vector space of polynomials interpreted as functions on the closed interval , equipped with supremum norm. The derivative map is not continuous, and you can verify this directly by searching for a sequence of functions that converges to 0 whose image does not converge to 0.

Probably too late at this point for you, but in case other people come along... I'd recommend learning functional analysis first in the context of a theoretical mechanics course/textbook, rather than a math course/textbook. The physicists tend to do a better job explaining the intuitions (and give *far* more exposure to applications), which I find is the most important thing for a first exposure. Full rigorous detail is something you can pick up later, if and when you need it.

Personally I did the exact opposite, and found that very refreshing. Whenever I ran into a snippet of applied functional analysis without knowing the formal background it just confused me.

## Foreword

What is functional analysis? A satisfactory answer requires going back to where it all started.

## A Friendly Approach to Functional Analysis

I didn't actually find the book overly hard (it took me seven days to complete, which is how long it took for my first book,

Naïve Set Theory), although there were some parts I skipped due to unclear exposition. it's actually one of my favorite books I've read in a while – it's for sure my favorite since the last one. That said, I'm very glad I didn't attempt this early in my book-reading journey.## My brain won't stop line to me

Some part of me

insistedthat the left-shift mapping(x1,x2,…)↦(x2,x3,…):ℓ∞→ℓ∞

is "non-linear" because it incinerates x1! But wait, brain, this totally

islinear, and it's also continuous with respect to the ambient supremum norm!Formally, a map T is linear when T(αx+βy)=αT(x)+βT(y).

Informally, linearity is about being able to split a problem into small parts which can be solved individually. It doesn't have to "look like a line", or something. In fact, lines

^{[1]}y=mx are linearbecauseputting in Δx more x gets you m⋅Δx more y!## Linearity and continuity

Two things surprised me.

First, a(n infinite-dimensional) linear function can be discontinuous. (?!)

Second, a linear function T is continuous if and only if it is bounded; that is, there is an M>0 such that ∀x,x0:||T(x−x0)||≤M||x−x0||.

ifis easy: this is just Lipschitz continuity, which obviously implies normal continuity.## What the hell are functional derivatives?

Derivatives tell you how quickly a function is changing in each input dimension. In single-variable calculus, the derivative of a function f:R→R is a function f′:R→R.

In multi-variable calculus, the derivative of a function g:Rn→R is a function g′:Rn→Rn – for a given n-dimensional input vector, the real-valued output of g can change differently depending on in which input dimension change occurs.

You can go even further and consider the derivative of h:Rn→Rm, which is the function h′:Rn→Rn×m – for a given n-dimensional input vector, h again can change its vector-valued output differently depending on in which input dimension change occurs.

But what if we want to differentiate the following function, with domain C[a,b] and range R:

L(f):=∫10(f(t))2dt.

How do you differentiate with respect to a function? I'm going to claim that

L′f(g)=∫102f(t)g(t)dt.

It's not clear why this is true, or what it even means. Here's an intuition: at any given point, there are uncountably many partial derivatives in the function space C[a,b] – there are many, many "directions" in which we could "push" a function f around. L′f(g) gives us the partial derivative at f with respect to g.

This concept is important because it's what you use to prove e.g. that a line is the shortest continuous path between two points.

Below is an exchange between me (in plain text) and TheMajor (quoted text), reproduced and slightly edited with permission.

I'm having trouble understanding functional derivatives. I'm used to thinking about derivatives as with respect to time, or with respect to variations along the input dimensions. But when I think about a derivative on function space, I'm not sure what the "time" is, even though I can think about the topology and the neighborhoods around a given function.

And I know the answer is that there isn't "time", but I'm not sure what there

is.An interesting concept that comes to mind is thinking about a functional derivative with respect to e.g. a straight-line homotopy, where you really

couldsay how a function is changing at every point with respect to time. But I don't think that's the same concept.By normal map, is that something like a normal operator?

Wouldn't it still output a function, g′ maybe? wait. Would the derivative wrt λ just be g?

ah ya. duh (ETA: my brain was still acting as if differentiation had to be from the real numbers to the real numbers, so it searched for a real/complex number in the problem formalization and found λ.)

Unfortunately, I don't think it's clear yet. So I see how this is a one-dimensional subspace,

^{[2]}because it's generated by one basis function (g).But I don't see how this translates to a normal complex derivative, in particular, I don't quite understand what the range of this function is.

I guess I'm confused why we're using that type signature if we're taking a derivative on the whole function – but maybe that'll be clear after I get the rest.

Okay, that makes sense so far.

So, given some arbitrary function L:X→C which is "differentiable" at f, we define a function L′f:g↦ (derivative of L at f with respect to g)?

You could even maybe think of each input g as projecting the derivative of L at f? Or specifying one of many possible directions.

this sounds pretty computationally easy? Or are you calculating L′ for a general test function g, in which case, how do you get any nontrivial information out of that?

ETA: Back in my

Topologyreview, I discussed a similar phenomenon: continuity in multiple input dimensions requires not just continuity in each input variable, but inallsequences converging to the point in question:"Continuity in the variables says that paths along the axes converge in the right way. But for continuity overall, we need all paths to converge in the right way. Directional continuity when the domain is R is a special case of this: continuity from below and from above if and only if continuity for all sequences converging topologically to x."

Similarly, for a function to be differentiable, the existence of all of its partial derivatives isn't enough – you need derivatives for every possible approach to the point in question. Here, the existence of all of the partials automatically guarantees the derivatives for every possible approach, because there's a partial for every function.

yeah, because L′ has to exist for… all g? That seems a little tough.

hm. That's because of the definition of linearity, right? it's a homomorphism for both the operations of addition and scalar multiplication... Wait, I intuitively understand why linearity means it's the same everywhere, but I'm having trouble coming up with the formal justification…

ah, got it!

I'm ready to be reconfused.

## Other notes

## Final thoughts

The book is pretty nice overall, with some glaring road bumps – apparently, the Euler-Lagrange equation is one of the most important equations of all time, and Sasane barely spends any effort explaining it to the reader!

And if I didn't have the help of TheMajor, I wouldn't have understood the functional derivative, which, in my opinion, was

theprofoundly important insight I got from this book. My models of function space structure feel qualitatively improved. I can look at a Fourier transform and see what it's doing – I canfeelit, to an extent. Without a doubt, that single insight makes it all worth it.## Forward

I'm probably going to finish up an epidemiology textbook, before moving on to complex analysis, microeconomics, or... something else – who knows! If you're interested in taking advantage of quarantine to do some reading, feel free to reach out and maybe we can work through something together. 🙂

Lines y=mx+b (b≠0) aren't actually linear functions, because they don't go through the origin. Instead, they're affine. ↩︎

To be more specific, f+Cg:={f+λg:λ∈C} is often an affine subspace, because the zero function is not necessarily a member. ↩︎