In machine learning, one often wants to uniformly approximate an arbitrary continuous function arbitrarily well using polynomials, neural networks, or something else. But in the case of complex-valued functions, this is more difficult. For example, the limit of holomorphic functions in the topology of uniform convergence on compact sets is always holomorphic, so holomorphic functions cannot directly approximate non-holomorphic functions. But in this post, I will show you how one can approximate arbitrary continuous functions indirectly from something like a holomorphic function.
I will state and prove the result in the full generality which means I will need to use uniform algebras over compact sets, but I do not need to use the theory of uniform algebras in order to state and prove the result. I will try to make everything here self-contained.
I came up with the statement and the proof of the result myself.
Motivation:
I am not too much of a fan of neural networks. Even though neural networks perform well in practice, they do not behave in a way that is too appealing to pure mathematicians. For example, if you try to feed purely imaginary inputs into a vanilla neural network with tanh activation and no bias, you will just obtain a random vector from some multi-variate Cauchy distribution as output.
Polynomials, on the other hand, behave in a way appealing to mathematicians, so it would be nice if multivariate polynomials had a stronger presence in machine learning. I have personally trained polynomial machine learning models and they seem to behave mathematically in the sense that if we train the model multiple times with different initializations, we tend to end up with the same trained model; these polynomial models that I have trained are somewhat sophisticated too since they can have multiple layers, but I have not developed them as well as neural networks.
To demonstrate how polynomials may be useful in machine learning, one may want to resort to a uniform approximation for polynomials, but the Stone-Weierstrass approximation theorem allows us to generalize such a uniform approximation theorem from polynomials to nearly arbitrary rings of real-valued functions on some compact Hausdorff space. In this post, we shall give another generalized uniform approximation theorem that applies to uniform algebras which are closed algebras of continuous functions on compact Hausdorff spaces.
In this post, we shall use quantum states to overcome the limitations of some classes of complex-valued functions in approximating arbitrary continuous functions. This use of quantum states alludes to a possible way that one can use quantum states and partial traces to improve the performance of machine learning models.
Uniform algebras:
Suppose that X is a compact Hausdorff space. Let C(X) denote the collection of all continuous functions f:X→C. Give C(X) the norm ∥∗∥ defined by∥f∥=max{|f(x)|:x∈X}. Then C(X) is a Banach algebra. A closed subalgebra A of C(X) is said to be a uniform algebra if A contains all constant functions and whenever x,y∈X,x≠y, there is some f∈A with f(x)≠f(y).
Example 0: If X is a compact Hausdorff space, then C(X) is always a uniform algebra.
Example 1: If K is a compact subset of Cn, then let A be the set of all continuous functions f:K→C which are holomorphic on the interior of A. Then A is a uniform algebra.
Example 2: Suppose that B is a commutative Banach algebra. Let X denote the set of all continuous Banach algebra homomorphisms ϕ:B→C. Then X becomes a compact Hausdorff space in the weak*-topology. If a∈B, then define a continuous function ^a:X→C by setting ^a(ϕ)=ϕ(a). Then {^a:a∈B} is a uniform algebra.
If X is a compact Hausdorff space, A⊆C(X) is a subspace algebra, and V is a finite dimensional complex vector space. Then let A⊗V denote the set of all functions f:X→V where if L:V→C is linear , then L∘f∈A.
Lemma: Suppose that X is a compact Hausdorff space A⊆C(X) is a linear subspace that contains all constant functions and where if x,y∈X,x≠y, there is some f∈A with f(x)≠f(y). Then whenever x0∈U⊆X and U is open, there is some finite dimensional complex inner product space V and f∈A⊗V with f(x0)=0 and ∥f(y)∥>1 whenever y∈X∖U.
Proof: The proof will just use a standard compactness argument. For each y∈X∖U, let f∈A be a function with |fy(y)|>1 and fy(x0)=0. Let Uy={x∈X:|fy(x)|>1} for each y∈X∖U. Then by compactness, there are y1,…,yn∈X∖U where X=U∪Uy1∪⋯∪Uyn. Define f:X→Cn by setting f(x)=(fy1(x),…,fyn(x)). Thenf(x0)=0 and if y∈X∖U, there is some k with y∈Uyk, so in this case, |fyk(y)|>1, so |f(y)|>1 as well. Q.E.D.
Recall that a density operator is a positive semidefinite trace 1 linear operator. Given a finite dimensional complex inner product space V, let D(V) denote the collection of all density operators A:V→V, and let L(V) denote the collection of all linear operators from V to V. Suppose that V,W are finite dimensional complex inner product spaces. Then the partial trace is the unique linear mapping TrW:L(V⊗W)→L(V) subject to the condition that TrW(R⊗S)=R⋅Tr(S) whenever R:V→V,S:W→W are linear.
Theorem: Suppose that X is a compact Hausdorff space, A⊆C(X) is uniform algebra, V is a finite dimensional complex inner product space and f:X→D(V) is a continuous function. Then whenever ϵ>0 and ∥∗∥ is a matrix norm, there is some finite dimensional complex inner product space W and g∈A⊗V⊗W where if x∈X,
then ∥f(x)−TrW(g(x)⋅g(x)∗)∥g(x)∥22∥<ϵ.
Proof: Suppose that U is an open cover of X. For each y∈X, there is some Vy∈U, natural number d(y), and some fy∈A⊗Cd(y) with fy(y)=0 and ∥fy(z)∥∞>2whenever z∈X∖Vy. Therefore, let Uy={x∈X:∥fy(x)∥∞<1/2}. Then by compactness, there are y1,…,yn where X=Uy1∪⋯∪Uyn. Let Uj=Uyj,Vj=Vyj.
Define fk=fyk. Let e be a unit vector orthogonal to each vector in complex Euclidean space.
Let Zk=Cd(yk)+⟨e⟩ for each k, and let Z=Z1⊗⋯⊗Zn. Let W0 be a finite dimensional complex inner product space, and let t:{y1,…,yn}→V⊗W0 be a function such that TrW0(t(yk)t(yk)∗)=f(yk) for all k. Let N be a positive integer. Define a function h:X→Z by setting h(x)=(e+f1(x)N)⊗⋯⊗(e+fn(x)N) where the power is taken elementwise.
If S is a finite set of natural numbers, then let S[k] denote the k-th element of the set S. Set [n]={1,…,n} for each natural number n. For each S⊆[n], let vj,S be a unit vector where if (j,S)≠(k,T), then ⟨vj,S,vk,T⟩=0.
We shall now construct a linear function I that maps Z to some other vector space.
Suppose that S⊆[n] with |S|=m and xj∈Cd(j) for j∈S and xj=e otherwise. Then set I(x1⊗⋯⊗xn)=∑j∈[n]∖St(yj)⊗xS[1]⊗⋯⊗xS[m]⊗vj,S. Define g=I∘h.
Observe that g(x)=∑S⊆[n]∑j∈[n]∖St(yj)⊗fS[1](x)N⊗⋯⊗fS[|S|](x)N⊗vj,S. We include the vector vj,S to make sure that the summands are all orthogonal.
Let PTr denote the partial trace where we trace out all factors in the tensor product except for V. Then
Suppose now that ϵ>0. Then suppose that whenever U∈U, if x,y∈U, then∥f(x)−f(y)∥<ϵ.
If x∈Uk and ∥f(x)−f(y)∥≥ϵ, then y∉Vk, so ∥fk(y)∥∞>2.
When N is large, the dominant terms in the sum for PTr(f(x)f(x)∗) are indexed by sets S where k∉S but where S contains all elements j with ∥f(x)−f(yj)∥≥ϵ. Therefore, the value PTr(g(x)g(x)∗)/∥g(x)∥22 can be uniformly approximated by f(x).
Q.E.D.
Conclusion
The proof of the above theorem consists of a standard compactness argument that any mathematician who specializes in analysis should be able to come up with. The above result also applies to the field of real numbers since I nowhere mentioned anything about complex numbers, but for the real numbers, the above result is a consequence of the Stone-Weierstrass theorem. The Stone-Weierstrass theorem also applies to quaternionic-valued functions, so it is unnecessary to state the result for quaternions (and the proof will need to be restated and rewritten since quaternionic tensor products don't work). One should therefore not be too surprised by the above theorem.
The above theorem may also be a corollary of a known result on Banach algebras (or something like that), but I was not able to find an appropriate reference.
In machine learning, one often wants to uniformly approximate an arbitrary continuous function arbitrarily well using polynomials, neural networks, or something else. But in the case of complex-valued functions, this is more difficult. For example, the limit of holomorphic functions in the topology of uniform convergence on compact sets is always holomorphic, so holomorphic functions cannot directly approximate non-holomorphic functions. But in this post, I will show you how one can approximate arbitrary continuous functions indirectly from something like a holomorphic function.
I will state and prove the result in the full generality which means I will need to use uniform algebras over compact sets, but I do not need to use the theory of uniform algebras in order to state and prove the result. I will try to make everything here self-contained.
I came up with the statement and the proof of the result myself.
Motivation:
I am not too much of a fan of neural networks. Even though neural networks perform well in practice, they do not behave in a way that is too appealing to pure mathematicians. For example, if you try to feed purely imaginary inputs into a vanilla neural network with tanh activation and no bias, you will just obtain a random vector from some multi-variate Cauchy distribution as output.
Polynomials, on the other hand, behave in a way appealing to mathematicians, so it would be nice if multivariate polynomials had a stronger presence in machine learning. I have personally trained polynomial machine learning models and they seem to behave mathematically in the sense that if we train the model multiple times with different initializations, we tend to end up with the same trained model; these polynomial models that I have trained are somewhat sophisticated too since they can have multiple layers, but I have not developed them as well as neural networks.
To demonstrate how polynomials may be useful in machine learning, one may want to resort to a uniform approximation for polynomials, but the Stone-Weierstrass approximation theorem allows us to generalize such a uniform approximation theorem from polynomials to nearly arbitrary rings of real-valued functions on some compact Hausdorff space. In this post, we shall give another generalized uniform approximation theorem that applies to uniform algebras which are closed algebras of continuous functions on compact Hausdorff spaces.
In this post, we shall use quantum states to overcome the limitations of some classes of complex-valued functions in approximating arbitrary continuous functions. This use of quantum states alludes to a possible way that one can use quantum states and partial traces to improve the performance of machine learning models.
Uniform algebras:
Suppose that X is a compact Hausdorff space. Let C(X) denote the collection of all continuous functions f:X→C. Give C(X) the norm ∥∗∥ defined by∥f∥=max{|f(x)|:x∈X}. Then C(X) is a Banach algebra. A closed subalgebra A of C(X) is said to be a uniform algebra if A contains all constant functions and whenever x,y∈X,x≠y, there is some f∈A with f(x)≠f(y).
Example 0: If X is a compact Hausdorff space, then C(X) is always a uniform algebra.
Example 1: If K is a compact subset of Cn, then let A be the set of all continuous functions f:K→C which are holomorphic on the interior of A. Then A is a uniform algebra.
Example 2: Suppose that B is a commutative Banach algebra. Let X denote the set of all continuous Banach algebra homomorphisms ϕ:B→C. Then X becomes a compact Hausdorff space in the weak*-topology. If a∈B, then define a continuous function ^a:X→C by setting ^a(ϕ)=ϕ(a). Then {^a:a∈B} is a uniform algebra.
If X is a compact Hausdorff space, A⊆C(X) is a subspace algebra, and V is a finite dimensional complex vector space. Then let A⊗V denote the set of all functions f:X→V where if L:V→C is linear , then L∘f∈A.
Lemma: Suppose that X is a compact Hausdorff space A⊆C(X) is a linear subspace that contains all constant functions and where if x,y∈X,x≠y, there is some f∈A with f(x)≠f(y). Then whenever x0∈U⊆X and U is open, there is some finite dimensional complex inner product space V and f∈A⊗V with f(x0)=0 and ∥f(y)∥>1 whenever y∈X∖U.
Proof: The proof will just use a standard compactness argument. For each y∈X∖U, let f∈A be a function with |fy(y)|>1 and fy(x0)=0. Let Uy={x∈X:|fy(x)|>1} for each y∈X∖U. Then by compactness, there are y1,…,yn∈X∖U where X=U∪Uy1∪⋯∪Uyn. Define f:X→Cn by setting f(x)=(fy1(x),…,fyn(x)). Thenf(x0)=0 and if y∈X∖U, there is some k with y∈Uyk, so in this case, |fyk(y)|>1, so |f(y)|>1 as well. Q.E.D.
Recall that a density operator is a positive semidefinite trace 1 linear operator. Given a finite dimensional complex inner product space V, let D(V) denote the collection of all density operators A:V→V, and let L(V) denote the collection of all linear operators from V to V. Suppose that V,W are finite dimensional complex inner product spaces. Then the partial trace is the unique linear mapping TrW:L(V⊗W)→L(V) subject to the condition that TrW(R⊗S)=R⋅Tr(S) whenever R:V→V,S:W→W are linear.
Theorem: Suppose that X is a compact Hausdorff space, A⊆C(X) is uniform algebra, V is a finite dimensional complex inner product space and f:X→D(V) is a continuous function. Then whenever ϵ>0 and ∥∗∥ is a matrix norm, there is some finite dimensional complex inner product space W and g∈A⊗V⊗W where if x∈X,
then ∥f(x)−TrW(g(x)⋅g(x)∗)∥g(x)∥22∥<ϵ.
Proof: Suppose that U is an open cover of X. For each y∈X, there is some Vy∈U, natural number d(y), and some fy∈A⊗Cd(y) with fy(y)=0 and ∥fy(z)∥∞>2whenever z∈X∖Vy. Therefore, let Uy={x∈X:∥fy(x)∥∞<1/2}. Then by compactness, there are y1,…,yn where X=Uy1∪⋯∪Uyn. Let Uj=Uyj,Vj=Vyj.
Define fk=fyk. Let e be a unit vector orthogonal to each vector in complex Euclidean space.
Let Zk=Cd(yk)+⟨e⟩ for each k, and let Z=Z1⊗⋯⊗Zn. Let W0 be a finite dimensional complex inner product space, and let t:{y1,…,yn}→V⊗W0 be a function such that TrW0(t(yk)t(yk)∗)=f(yk) for all k. Let N be a positive integer. Define a function h:X→Z by setting h(x)=(e+f1(x)N)⊗⋯⊗(e+fn(x)N) where the power is taken elementwise.
If S is a finite set of natural numbers, then let S[k] denote the k-th element of the set S. Set [n]={1,…,n} for each natural number n. For each S⊆[n], let vj,S be a unit vector where if (j,S)≠(k,T), then ⟨vj,S,vk,T⟩=0.
We shall now construct a linear function I that maps Z to some other vector space.
Suppose that S⊆[n] with |S|=m and xj∈Cd(j) for j∈S and xj=e otherwise. Then set I(x1⊗⋯⊗xn)=∑j∈[n]∖St(yj)⊗xS[1]⊗⋯⊗xS[m]⊗vj,S. Define g=I∘h.
Observe that g(x)=∑S⊆[n]∑j∈[n]∖St(yj)⊗fS[1](x)N⊗⋯⊗fS[|S|](x)N⊗vj,S. We include the vector vj,S to make sure that the summands are all orthogonal.
Let PTr denote the partial trace where we trace out all factors in the tensor product except for V. Then
PTr(g(x)g(x)∗)=∑S⊆[N]∑j∈[n]∖Sf(yj)⋅∥fS[1](x)N∥22…∥fS[|S|](x)N∥22.
Suppose now that ϵ>0. Then suppose that whenever U∈U, if x,y∈U, then∥f(x)−f(y)∥<ϵ.
If x∈Uk and ∥f(x)−f(y)∥≥ϵ, then y∉Vk, so ∥fk(y)∥∞>2.
When N is large, the dominant terms in the sum for PTr(f(x)f(x)∗) are indexed by sets S where k∉S but where S contains all elements j with ∥f(x)−f(yj)∥≥ϵ. Therefore, the value PTr(g(x)g(x)∗)/∥g(x)∥22 can be uniformly approximated by f(x).
Q.E.D.
Conclusion
The proof of the above theorem consists of a standard compactness argument that any mathematician who specializes in analysis should be able to come up with. The above result also applies to the field of real numbers since I nowhere mentioned anything about complex numbers, but for the real numbers, the above result is a consequence of the Stone-Weierstrass theorem. The Stone-Weierstrass theorem also applies to quaternionic-valued functions, so it is unnecessary to state the result for quaternions (and the proof will need to be restated and rewritten since quaternionic tensor products don't work). One should therefore not be too surprised by the above theorem.
The above theorem may also be a corollary of a known result on Banach algebras (or something like that), but I was not able to find an appropriate reference.