In this post, we shall define my new dimensionality reduction for tensors in V⊗n where n≥3, and we shall make an empirical observation about the structure of the dimensionality reduction. There are various simple ways of adapting this dimensionality reduction algorithm to tensors in V1⊗⋯⊗Vn and even mixed quantum states (mixed states are just positive semidefinite matrices in V1⊗⋯⊗Vn which trace 1), but that will be a topic for another post.
This dimensionality reduction shall represent tensors in V⊗n as tuples of matrices A1,…,Ar. Computer experiments indicate that, in many cases, we have Tr(Ai1…Aim)=0 whenever m≠0modn.
If X is a matrix, then the spectral radius of X is the value
ρ(X)=max{|λ|:λ is an eigenvalue of X}=limn→∞∥An∥1/n.
If X is a matrix, then define the conjugate matrix ¯¯¯¯¯X=(X∗)T=(XT)∗; this is the matrix obtained from X by replacing each entry with its complex conjugate.
If (X1,…,Xr) is a tuple of real or complex matrices, then define the L2-spectral radius by setting
ρ2(X1,…,Xr)=ρ(X1⊗¯¯¯¯¯¯¯X1+⋯+Xr⊗¯¯¯¯¯¯¯Xr)1/2.
Suppose that K is either the field of real numbers or the field of complex numbers. Suppose that p(x1,…,xr) is a non-commutative homogeneous of degree n polynomial with coefficients in K (it is easier to define the dimensionality reduction in terms of homogeneous non-commutative polynomials than tensors).
Then define a fitness function Mp:Md(K)r→[0,∞) by setting
Mp(A1,…,Ar)=ρ(p(A1,…,Ar))1/nρ2(A1,…,Ar).
This function Mp is bounded, and it has a maximum value, but to prove that it attains its maximum value, we need to use quantum channels.
We shall call a tuple (A1,…,Ar) where Mp(A1,…,Ar) is maximized an
L2,d-spectral radius dimensionality reduction (LSRDR) of the non-commutative polynomial p(x1,…,xr). The motivation behind the notion of an LSRDR is that it is easier to represent the variables x1,…,xr as the matrices A1,…,Ar than it is to work with the non-commutative polynomial p(x1,…,xr). The d×d-matrices A1,…,Ar have d2⋅r parameters while the non-commutative polynomial could have up to rn parameters where n is the degree of the polynomial p(x1,…,xr).
We observe that if p is a quadratic non-commutative homogeneous polynomial, then maxMp(A1,…,Ar)=∥p∥1/22 where ∥⋅∥2 refers to the Frobenius norm. In other words, we already have a well-developed theory of matrices, and LSRDRs do not improve the theory of matrices, but LSRDRs help us analyze tensors of order 3 in several different ways.
Given square matrices A1,…,Ar∈Md(K), define a completely positive superoperatorΦ(A1,…,Ar):Md(K)→Md(K) by settingΦ(A1,…,Ar)(X)=A1XA∗1+⋯+ArXA∗r. The operator Φ(A1,…,Ar) is similar to the matrix A1⊗¯¯¯¯¯¯A1+⋯+Ar⊗¯¯¯¯¯¯Ar.
Observation: Suppose that p is a non-commutative homogeneous polynomial of degree n with random complex coefficients. Let (A1,…,Ar) be an L2,d-spectral radius dimensionality reduction of p. Then we often have Tr(q(A1,…,Ar))=0 whenever q is a homogeneous non-commutative homogeneous polynomial with degree m where mmodn≠0. Furthermore, the set of eigenvalues of Φ(A1,…,Ar) is invariant under rotations by 2π/n. Said differently, Tr(Φ(A1,…,Ar)m)=0 whenever mmodn≠0.
I currently do not have an adequately developed explanation for why Tr(q(A1,…,Ar))=0 and Tr(Φ(A1,…,Ar)m)=0 so often (more experimentation is needed), but such an explanation is probably within reach. The observation Tr(q(A1,…,Ar))=0 does not occur 100 percent of the time since we get Tr(q(A1,…,Ar))=0 only when the conditions are right.
If A1,…,Ar∈Md(K), then
Tr(Φ(A1,…,Ar))=|Tr(A1)|2+⋯+|Tr(Ar)|2. Therefore, Tr(Φ(A1,…,Ar))=0 precisely when Tr(Aj)=0 for 1≤j≤r. Furthermore,
Tr(Φ(A1,…,Ar)m)=∑i1,…,im|Tr(Ai1…Aim)|2, so Tr(Φ(A1,…,Ar)m)=0 precisely when Tr(Ai1…Aim)=0 whenever i1,…,im∈{1,…,r}.
Tr(Φ(A1,1,…,A1,r1)…Φ(As,1,…,As,rs))=∑i1∈{1,…,r1},…,is∈{1,…,rs}|Tr(A1,i1…As,is)|2. Therefore,Tr(Φ(A1,1,…,A1,r1)…Φ(As,1,…,As,rs))=0 precisely when Tr(A1,i1…As,is)=0 whenever i1∈{1,…,r1},…,is∈{1,…,rs}.
Remark:
LSRDRs of tensors are well-behaved in other ways besides having trace zero. For example, if we train two LSRDRs (A1,…,Ar),(B1,…,Br) of a tensor multiple times with the same initialization, then we typically have Mp(A1,…,Ar)=Mp(B1,…,Br) (but this does not happen 100 percent of the time either). After training, the resulting LSRDR therefore does not have any random information left over from the initialization or the training, and any random information present in an LSRDR was originally in the tensor itself.
Remark:
We have some room to modify our fitness function while still retaining the properties of LSRDRs of tensors. For example, suppose that p is a homogeneous non-commutative polynomial of degree n, and define Mp,s:Md(K)r→[0,∞) by setting
Mp,s(A1,…,Ar)=ρ(p(A1,…,Ar))1/n∥A1A∗1+⋯+ArA∗r∥1/2s. Then if p is a random homogeneous non-commutative complex polynomial and 1<s≤∞ and ∥⋅∥s denotes the Schatten norm (∥X∥s=(Tr((XX∗)s/2))1/s which is the ℓs norm of the singular values of X), and (A1,…,Ar) maximizes Mp,s(A1,…,Ar), then (if everything works out right), we still would have Tr(Ai1…Aim)=0 whenever m≠0modn.
Conclusion:
Since LSRDRs of tensors do not leave behind any random information that is not already present in the tensors themselves, we should expect for LSRDRs to be much more interpretable than machine learning systems like neural networks that do retain much random information left over from the initialization. Since LSRDRs of tensors give us so many trace zero operators, one should consider LSRDRs of tensors as very well behaved systems, and well behaved systems should be much more interpretable than poorly behaved systems.
I look forward of using LSRDRs of tensors to interpret machine learning models and produce new highly interpretable machine learning models. I do not see LSRDRs of tensors replacing deep learning, but LSRDRs have properties that are hard to reproduce using deep learning, so I look forward to exploring the possibilities with LSRDRs of tensors. I will make more posts about LSRDRs of tensors and other objects produced with similar objective functions.
Edits: (10/12/2023) I originally claimed that my dimensionality reduction does not work well for tensors in V1⊗⋯⊗Vn, but after reexperimentation, I was able to reduce random tensors in V1⊗⋯⊗Vn to matrices, and such a dimensionality reduction performed well.
Massively downvoting mathematics without commenting at all just shows that the people on this site are very low quality specimens who do not care at all about rationality at all but who just want to pretend to be smart.
In this post, we shall define my new dimensionality reduction for tensors in V⊗n where n≥3, and we shall make an empirical observation about the structure of the dimensionality reduction. There are various simple ways of adapting this dimensionality reduction algorithm to tensors in V1⊗⋯⊗Vn and even mixed quantum states (mixed states are just positive semidefinite matrices in V1⊗⋯⊗Vn which trace 1), but that will be a topic for another post.
This dimensionality reduction shall represent tensors in V⊗n as tuples of matrices A1,…,Ar. Computer experiments indicate that, in many cases, we have Tr(Ai1…Aim)=0 whenever m≠0modn.
If X is a matrix, then the spectral radius of X is the value
ρ(X)=max{|λ|:λ is an eigenvalue of X}=limn→∞∥An∥1/n.
If X is a matrix, then define the conjugate matrix ¯¯¯¯¯X=(X∗)T=(XT)∗; this is the matrix obtained from X by replacing each entry with its complex conjugate.
If (X1,…,Xr) is a tuple of real or complex matrices, then define the L2-spectral radius by setting
ρ2(X1,…,Xr)=ρ(X1⊗¯¯¯¯¯¯¯X1+⋯+Xr⊗¯¯¯¯¯¯¯Xr)1/2.
Suppose that K is either the field of real numbers or the field of complex numbers. Suppose that p(x1,…,xr) is a non-commutative homogeneous of degree n polynomial with coefficients in K (it is easier to define the dimensionality reduction in terms of homogeneous non-commutative polynomials than tensors).
Then define a fitness function Mp:Md(K)r→[0,∞) by setting
Mp(A1,…,Ar)=ρ(p(A1,…,Ar))1/nρ2(A1,…,Ar).
This function Mp is bounded, and it has a maximum value, but to prove that it attains its maximum value, we need to use quantum channels.
We shall call a tuple (A1,…,Ar) where Mp(A1,…,Ar) is maximized an
L2,d-spectral radius dimensionality reduction (LSRDR) of the non-commutative polynomial p(x1,…,xr). The motivation behind the notion of an LSRDR is that it is easier to represent the variables x1,…,xr as the matrices A1,…,Ar than it is to work with the non-commutative polynomial p(x1,…,xr). The d×d-matrices A1,…,Ar have d2⋅r parameters while the non-commutative polynomial could have up to rn parameters where n is the degree of the polynomial p(x1,…,xr).
We observe that if p is a quadratic non-commutative homogeneous polynomial, then maxMp(A1,…,Ar)=∥p∥1/22 where ∥⋅∥2 refers to the Frobenius norm. In other words, we already have a well-developed theory of matrices, and LSRDRs do not improve the theory of matrices, but LSRDRs help us analyze tensors of order 3 in several different ways.
Given square matrices A1,…,Ar∈Md(K), define a completely positive superoperatorΦ(A1,…,Ar):Md(K)→Md(K) by settingΦ(A1,…,Ar)(X)=A1XA∗1+⋯+ArXA∗r. The operator Φ(A1,…,Ar) is similar to the matrix A1⊗¯¯¯¯¯¯A1+⋯+Ar⊗¯¯¯¯¯¯Ar.
Observation: Suppose that p is a non-commutative homogeneous polynomial of degree n with random complex coefficients. Let (A1,…,Ar) be an L2,d-spectral radius dimensionality reduction of p. Then we often have Tr(q(A1,…,Ar))=0 whenever q is a homogeneous non-commutative homogeneous polynomial with degree m where mmodn≠0. Furthermore, the set of eigenvalues of Φ(A1,…,Ar) is invariant under rotations by 2π/n. Said differently, Tr(Φ(A1,…,Ar)m)=0 whenever mmodn≠0.
I currently do not have an adequately developed explanation for why Tr(q(A1,…,Ar))=0 and Tr(Φ(A1,…,Ar)m)=0 so often (more experimentation is needed), but such an explanation is probably within reach. The observation Tr(q(A1,…,Ar))=0 does not occur 100 percent of the time since we get Tr(q(A1,…,Ar))=0 only when the conditions are right.
If A1,…,Ar∈Md(K), then
Tr(Φ(A1,…,Ar))=|Tr(A1)|2+⋯+|Tr(Ar)|2. Therefore, Tr(Φ(A1,…,Ar))=0 precisely when Tr(Aj)=0 for 1≤j≤r. Furthermore,
Tr(Φ(A1,…,Ar)m)=∑i1,…,im|Tr(Ai1…Aim)|2, so Tr(Φ(A1,…,Ar)m)=0 precisely when Tr(Ai1…Aim)=0 whenever i1,…,im∈{1,…,r}.
Tr(Φ(A1,1,…,A1,r1)…Φ(As,1,…,As,rs))=∑i1∈{1,…,r1},…,is∈{1,…,rs}|Tr(A1,i1…As,is)|2. Therefore,Tr(Φ(A1,1,…,A1,r1)…Φ(As,1,…,As,rs))=0 precisely when Tr(A1,i1…As,is)=0 whenever i1∈{1,…,r1},…,is∈{1,…,rs}.
Remark:
LSRDRs of tensors are well-behaved in other ways besides having trace zero. For example, if we train two LSRDRs (A1,…,Ar),(B1,…,Br) of a tensor multiple times with the same initialization, then we typically have Mp(A1,…,Ar)=Mp(B1,…,Br) (but this does not happen 100 percent of the time either). After training, the resulting LSRDR therefore does not have any random information left over from the initialization or the training, and any random information present in an LSRDR was originally in the tensor itself.
Remark:
We have some room to modify our fitness function while still retaining the properties of LSRDRs of tensors. For example, suppose that p is a homogeneous non-commutative polynomial of degree n, and define Mp,s:Md(K)r→[0,∞) by setting
Mp,s(A1,…,Ar)=ρ(p(A1,…,Ar))1/n∥A1A∗1+⋯+ArA∗r∥1/2s. Then if p is a random homogeneous non-commutative complex polynomial and 1<s≤∞ and ∥⋅∥s denotes the Schatten norm (∥X∥s=(Tr((XX∗)s/2))1/s which is the ℓs norm of the singular values of X), and (A1,…,Ar) maximizes Mp,s(A1,…,Ar), then (if everything works out right), we still would have Tr(Ai1…Aim)=0 whenever m≠0modn.
Conclusion:
Since LSRDRs of tensors do not leave behind any random information that is not already present in the tensors themselves, we should expect for LSRDRs to be much more interpretable than machine learning systems like neural networks that do retain much random information left over from the initialization. Since LSRDRs of tensors give us so many trace zero operators, one should consider LSRDRs of tensors as very well behaved systems, and well behaved systems should be much more interpretable than poorly behaved systems.
I look forward of using LSRDRs of tensors to interpret machine learning models and produce new highly interpretable machine learning models. I do not see LSRDRs of tensors replacing deep learning, but LSRDRs have properties that are hard to reproduce using deep learning, so I look forward to exploring the possibilities with LSRDRs of tensors. I will make more posts about LSRDRs of tensors and other objects produced with similar objective functions.
Edits: (10/12/2023) I originally claimed that my dimensionality reduction does not work well for tensors in V1⊗⋯⊗Vn, but after reexperimentation, I was able to reduce random tensors in V1⊗⋯⊗Vn to matrices, and such a dimensionality reduction performed well.
Edited 1/10/2024