Linear Algebra Done Right, Axler

13Erik Jenner

3David Udell

7Oliver Sourbut

9Oliver Sourbut

9Oliver Sourbut

4Gurkenglas

New Comment

Then the eigenvectors of consist precisely of the entries on the diagonal of that upper-triangular matrix

I think this is a typo and should be "eigenvalues" instead of "eigenvectors"?

The determinant is negative when the operator flips all the vectors it works on.

This could be misleading. E.g. the operator f(v) := -v that literally just flips all vectors has determinant (-1)^n, where n is the dimension of the space it's working on. The sign of the determinant tells you whether an operator flips the orientation of *volumes*, it can't tell you anything about what it does to individual vectors.

(Regarding "orientation of volumes": in the 2D case, think of R^2 as a sheet of paper, then f(v) := -v is just a 180 degree rotation, so the same side stays up, and the determinant is positive. In contrast, flipping along an axis requires turning over the paper, so negative determinant. Unfortunately this can't really be visualized the same way in 3D, so then you have to think about ordered bases.)

Let now specifically be a one-dimensional subspace of such that, for all ,

I think such can not exist in most cases, and it should instead read '... for some ...'

The expression for is describing the span of the vector , so certainly if is more than one-dimensional, if some subspace has this property for *all* then it has this property for *linearly independent* vectors in , which is a contradiction.

The definition of matrix ("the basis maps to:") ought to come after the "uniquely determines the linear map" that justifies it.

For interpreting v as a slim matrix, I would use bra-ket notation: |v> for the function of type V <- R, <v| for the function whose type is the dual R <- V. Then <v|v> has type R <- R (and corresponds to multiplication by a scalar) and |v><v| has type V <- V.

An inner product just maps |v> to <v|. (Though I don't quite see what the symmetry is for.)

Mapping a point cloud through a linear map thins it by a factor of the determinant; this generalizes to smooth maps, since they are locally linear.

Epistemic status: A brisk walkthrough of (what I take to be) the highlights of this book's contents.The big one for mathematically understanding ML!

The idea responsible for getting me excited about linear algebra is:

Linear algebra is about the tripartite relationship between (1) homomorphisms

^{[1]}between vector spaces, (2) sets of equations, and (3) grids of numbers.However, grids of numbers ('matrices'), the usual star of the show in a presentation of linear algebra, aren't foregrounded in this book. Instead, this is a book chiefly treating the homomorphisms ('linear maps') themselves, directly.

## Contents and Notes

## 1. Vector Spaces

Vector spaces are fairly substantial mathematical structures, if you're pivoting out of thinking about set theory! Intuitively, a vector space is a space Rn for which (1) ray addition and (2) scaling rays (emanating from the origin out to points)

^{[2]}are both nicely defined.Precisely, a

vector spaceis a set V defined over a field F^{[3]}in which- V is
- V is
- and vector addition and scalar multiplication are

a(→v+→x)=a→v+a→x(a+b)→v=a→v+b→vclosed under vector addition,and vector addition iscommutative, associative,there is anadditive identity→0, and there is anadditive inversefor every vector →v∈V;closed under scalar multiplication,scalar multiplication isassociative, and there is amultiplicative identity1;connected by distributionsuch that, for all a,b∈F and →v,→x∈V,^{[4]}AV

subspaceS of a vector space V is any subset S⊂V that is still itself a vector space, under the same two operations of.Vector spaces can be decomposed into their subspaces, where you think of adding vectors drawn the different subspace via their common addition operation.## 2. Finite-Dimensional Vector Spaces

You live at the origin of R3, and your tools are the vectors that emanate out from your home. Because we have both vector addition and scalar multiplication, we have two ways of extending (or shortening) any single vector out from the origin arbitrarily far. If we're interested in

reaching points inR3, one immediate way to get to points we didn't have a vector directly to... is by extending a too-short vector pointed in the right direction! Furthermore, because we can always multiply a vector by −1 to reverse its direction, both the exactly rightand exactly wrongdirections will suffice to reach out and touch a point in R3.We can also use vector addition to add two vectors pointing off in differing directions (directions which aren't exact opposites). If we have vectors →v=[0.5,0,0]T, →x=[0,45,0]T, and →q=[0,0,0.11]T,

^{[5]}we have all the tools we need to produce any vector in R3! The awkward lengths of all the vectors are irrelevant, because we can scale all of them arbitrarily. We use some amount of vertical, horizonal, and z-dimensional^{[6]}displacement to get to anywhere via addition and multiplication! More formally, we say that the set {→v,→x,→q}spansR3.Intuitively, a minimal spanning set is called a

basisfor a vector space. {→v,→x,→q} is a basis for the vector space R3, because none of the vectors are "redundant": you could not produce every vector in R3 without all three elements in {→v,→x,→q}. If you added any further vector to that spanning set, though, the set would now have a redundant vector, as R3 isalreadyspanned. The set would no longer be a minimal spanning set in this sense, and so would cease to be a basis for R3.Every finite-dimensional, nonzero

^{[7]}vector space containing infinitely many vectors has infinitely many bases (pp. 29-32). Each basis for an n-dimensional vector space is a set containing n vectors, where each vector is an ordered set containing n numbers drawn from F (p. 32).## 3. Linear Maps

Intuitively, a linear map is a function that translates addition and multiplication between two vector spaces.

Formally, ais a functionfrom a vector space V to a vector space W (taking vectors and returning vectors) such that

f(→v+→x)=f(→v)+f(→x)f(a→v)=a(f(→v))linear mapf:V→Wfor all →v,→x∈V; all f(→v),f(→x)∈W; and all a∈F. Note that both are homomorphism properties: one for addition across vector spaces and one for multiplication across vector spaces! We'll call the former relationship

additivity,the latter,homogeneity.The symbol L(V,W) stands for the set of all the linear maps from V to W.

^{[8]}Some example linear maps (pp. 38-9) include:

f1(→v)=0→vf2(→v)=→vWhen the vector spaces are specifically

f3(→p)=dp(x)dxf4(→p)=∫p(x)dxthe set of all real-valued polynomialsp(x):^{[9]}translating between →p and p(x).

As linear maps are functions, they can be composed when they have matching domains and co-domains, giving us our notion of

productsbetween linear maps.The

kernelof a linear map f∈L(V,W) is the subset ker(f)⊂V containing all and only the vectors →v∈V that f maps to →0∈W. Note that linear maps can only "get rid" of vectors by shrinking them down all the way, i.e., by sending them to →0. If a function between vector spaces simply sent everything to a nonzero vector, it would violate the linear map axioms! All kernels are subspaces of V (p. 42). A linear map is injective whenever ker(f)={→0} (p. 43).The

imageim(f) of f is the subset of W covered by some f(→v). All images are subspaces of W (p. 44). A linear map is obviously surjective whenever im(f)=W.## The Matrix of a Linear Map

A

M=⎡⎢ ⎢ ⎢⎣a1,1⋯a1,n⋮⋱⋮am,1⋯am,n⎤⎥ ⎥ ⎥⎦matrixM is an array of numbers, with m rows and n columns:(Matrices are a

generalization of vectorsinto the horizontal dimension, and vectors can be thought of as skinny m-by-1 matrices.)The vector f(→v)=M(f)→v, with matrix multiplication on the right side of the equation (pp. 53-4).

## 4. Polynomials

## 5. Eigenvalues and Eigenvectors

We now begin our study of operator theory!

Operatorsare linear maps from V to itself. Notationally, L(V):=L(V,V).We call a subspace S⊂V

f(→s)∈Sinvariantunder f∈L(V) if, for all →s∈S,Let S now specifically be a one-dimensional subspace of V such that, fixing any nonzero →v∈V,

S={a→v:a∈F}In the above equation f(→v)=λ→v, the scalar λ is called an

eigenvalueof f, and the corresponding vector →v is called aneigenvectorof f.## Polynomials Applied to Operators

An operator raised to a power m is just that operator composed with itself m times.

Because we have a notion of functional products, functional sums, and now operators raised to powers, we can now construct arbitrary polynomials with

operators as the variables!## Upper-Triangular Matrices

A

square matrixis an m×m matrix.An

upper-triangular matrixis a square matrix for which all entries under the principal diagonal equal 0.## Diagonal Matrices

A

diagonal matrixis a square matrix for which all entries off the principal diagonal equal 0.## 6. Inner-Product Spaces

where xn is the nth entry in →x, and similarly for yn and →y (p. 98; notation converted).

Inner productsare just a generalization of dot products to arbitrary vector spaces V. (With some finagling, both dot products and inner products generally can be interpreted as linear maps.) Aninner-product spaceis an ordered set containing a vector space V and an inner product on it.Intuitively, the norm of a vector is the length of that vector, interpreted as a ray, from the origin to its tip. More formally, the

∥→v∥:=√→v⋅→vnormof a vector →v in an inner-product space is defined to bethe square root of the inner product of that vector→vwith itself:Note that this looks

just likec=√a2+b2, the Pythagorean theorem for the sides a,b,c of a right triangle in Euclidian space. That's because other inner products on other vector spaces aremeantto allow for a generalization of the Pythagorean theorem in those vector spaces!Intuitively, two vectors are orthogonal when they're perpendicular. Formally, two vectors are called

→a⋅→b=a1b1+a2b2=(0)(1)+(1)(0)=0orthogonalwhen their inner product is 0. With the opposite and adjacent sides →a,→b of the right unit triangle in the vector space R2,"It's all just right triangles, dude."

## 7. Operators on Inner-Product Spaces

The

singular valuesof f are the eigenvalues of √f∗f, where each eigenvalue λ is repeated dimker(f∗f−λI) times (p. 155).## 8. Operators on Complex Vector Spaces

## 9. Operators on Real Vector Spaces

The Cayley-Hamilton theorem also holds on complex vector spaces generally (p. 173).

## 10. Trace and Determinant

Intuitively, the

determinantof an operator f is thechange in volumef effects. The determinant is negative when the operator~~flips all the vectors~~"inverts the volume" it works on.^{^}Intuitively, a a homomorphism is a function showing how

the operation of vector addition can be translated from one vector space into anotherand back.More precisely, a

f(→v+→x)=f(→v)+f(→x)homomorphismis a function (here, from a vector space V to a vector space W) such thatwith →v,→x∈V and f(→v),f(→x)∈W.

The vector addition symbol + on the

left side of the equality, inside the function, is defined in V, and the addition symbol + on theright side of the equality, between the function values,is defined in W.^{^}Vectorscan be interpreted geometrically as rays from the origin out to points in a space. Vectors can also be understoodalgebraicallyas ordered sets of numbers (with each number representing a coordinate over in the ray interpretation).As far as notation goes, we'll use variables with arrows →v for vectors, lowercase variables x for numbers, and capital variables V for other larger mathematical structures, such as vector spaces.

^{^}In this book, that field F will be either the reals R or the complexes C.

^{^}Take note of how homomorphism-ish the below distributive relationships are!

^{^}Vectors are conventionally written vertically. But each vector →v=[10] has a

transpose[1,0]T=→v=[10], where the vector is written out horizontally instead.So we'll use vector transposes to stay in line with conventional notation while not writing out those giant vertical vectors everywhere.

^{^}One deep idea out of mathematics is that the dimensionality of a system is just the number of variables in that system that can vary independently of every other variable. You live in 3-dimensional space because you can vary your horizontal, vertical, and z-dimensional position without necessarily changing your position in the other two spatial dimensions by doing so.

^{^}Note that the set {→0}, where →0 is a vector containing only 0 any number n∈N of times, satisfies the vector space axioms!

→0+→0=→0=(→0+→0)+→0=→0+(→0+→0)establishes closure under addition, existence of an additive identity, existence of an additive inverse for all vectors, additive commutativity, and additive associativity. Letting the field be the reals with n,m∈R

n→0=→0=m(n→0)=(mn)→0=1(→0)establishes closure under multiplication, multiplicative associativity, and the existence of a multiplicative identity. Finally,

n(→0+→0)=n→0+n→0=→0=(n+m)→0=n→0+m→0establishes distributivity.

Any such vector space {→0} has just one basis, ∅. Intuitively, since you live at the origin, the origin is already spanned by no vectors at all -- i.e., the empty set of vectors. Any additional vector would be redundant, so no other sets constitute bases for {→0}.

^{^}In math, the bigger and/or fancier the symbol, the bigger the set or class that symbol usually stands for.

^{^}A vector →p can stand for a polynomial by

containing all the coefficients in the polynomial, coefficients ordered by the degree of each coefficient's monomial.^{^}This is addition

of functions,(f+g)x=f(x)+g(x), on the left side of the equation. I is the identity function.^{^}dimV is the

dimensionof V, formalized asthe number of vectors in any basisof V.^{^}Intuitively, orthonormal sets are nice sets of vectors like {[1,0,0]T,[0,1,0]T,[0,0,1]T}, where each vector has length one and is pointing out in a separate dimension.

More precisely, a set of vectors is called

orthonormalwhen its elements are pairwise orthogonal and each vector has a norm of 1. We will especially care about orthonormalbases, like the set above with respect to R3.^{^}The

adjointof a linear map f:V→W is a linear map f∗:W→V such that the inner product of f(→v) and →w equals the inner product of →v and f∗(→w) for all →v∈V and →w∈W.Remember that inner products aren't generally commutative, so the order of arguments matters. Adjoints feel very anticommutative.

An operator f∈L(V) on an inner-product space V is called

ff∗=f∗fnormalwhen^{^}An operator f is

self-adjointwhen f=f∗.^{^}^{^}Characteristic polynomials can also be defined for real vector spaces, though the reals are a little less well behaved as vector spaces than the complexes.