For machine learning, it is desirable for the trained model to have absolutely no random information left over from the initialization; in this short post, I will mathematically prove an interesting (to me) but simple consequence of this desirable behavior.
This post is a result of some research that I am doing for machine learning algorithms related to my investigation of cryptographic functions for the cryptocurrency that I launched (to discuss crypto, leave me a personal message so we can discuss this off this site).
This post shall be about linear machine learning models. Actually, we are using quantum operators, so they are more sophisticated than your logistic regression models, but they are still linear so it is really easy to train a neural network that can solve more sophisticated problems than these linear models can. But the kinds of results that you find in this post can also extend to some non-linear models with multiple layers and stronger capabilities. It is just easier to understand what is going on with the linear models, and even with the linear models, we still obtain some interesting mathematics.
We say that a machine learning model trained by gradient ascent/descent is pseudodeterministically trained (or just pseudodeterministic for short) if the fitness/loss function has precisely one local optimum. As a result, the trained model will have absolutely no information left over from the initialization. As another consequence, the trained model will attain the global optimum rather than a suboptimal local optimum. The results in this post will actually hold whenever the global optimum is unique. But I need to bring up pseudodeterminism since pseudodeterminism implies that we can actually find the unique global optimum instead of always getting stuck at a suboptimal local optimum.
If a machine learning model global optimizes an objective function, the machine learning model should be considered as an inherently interpretable model rather than a