I recently stumbled across this remarkable interview with Vladimir Vapnik, a leading light in statistical learning theory, one of the creators of the Support Vector Machine algorithm, and generally a cool guy. The interviewer obviously knows his stuff and asks probing questions. Vapnik describes his current research and also makes some interesting philosophical comments:
V-V: I believe that something drastic has happened in computer science and machine learning. Until recently, philosophy was based on the very simple idea that the world is simple. In machine learning, for the first time, we have examples where the world is not simple. For example, when we solve the "forest" problem (which is a low-dimensional problem) and use data of size 15,000 we get 85%-87% accuracy. However, when we use 500,000 training examples we achieve 98% of correct answers. This means that a good decision rule is not a simple one, it cannot be described by a very few parameters. This is actually a crucial point in approach to empirical inference.
This point was very well described by Einstein who said "when the solution is simple, God is answering". That is, if a law is simple we can find it. He also said "when the number of factors coming into play is too large, scientific methods in most cases fail". In machine learning we dealing with a large number of factors. So the question is what is the real world? Is it simple or complex? Machine learning shows that there are examples of complex worlds. We should approach complex worlds from a completely different position than simple worlds. For example, in a complex world one should give up explain-ability (the main goal in classical science) to gain a better predict-ability.
R-GB: Do you claim that the assumption of mathematics and other sciences that there are very few and simple rules that govern the world is wrong?
V-V: I believe that it is wrong. As I mentioned before, the (low-dimensional) problem "forest" has a perfect solution, but it is not simple and you cannot obtain this solution using 15,000 examples.
R-GB: What do you think about the bounds on uniform convergence? Are they as good as we can expect them to be?
V-V: They are O.K. However the main problem is not the bound. There are conceptual questions and technical questions. From a conceptual point of view, you cannot avoid uniform convergence arguments; it is a necessity. One can try to improve the bounds, but it is a technical problem. My concern is that machine learning is not only about technical things, it is also about philosophy: What is the complex world science about? The improvement of the bound is an extremely interesting problem from mathematical point of view. But even if you'll get a better bound it will not be able help to attack the main problem: what to do in complex worlds?