At some point in history a lot of thought was put into obtaining the equation:
R*T = P*V/n
The ideal gas equation we learn in kindergarten, which uses the magic number R in order to make predictions about how n moles of an “ideal gas” will change in pressure, volume or temperature given that we can control two of those factors.
This law approximates the behavior of many gases with a small error and it was certainly useful for many o' medieval soap volcano party tricks and Victorian steam engine designs.
But, as is often the case in science, a bloke called Van der Waals decided to ruin everyone’s fun in the later 19th century by focusing on a bunch of edge cases where the ideal gas equation was utter rubbish when applied to any real gas.
He then proceeded to collect a bunch of data points for the behavior of various gases and came up with two other magic numbers to add to the equation, a and b, individually determined for each of the gases, which can be used in the equation:
RT = (P + a*n^2/V^2)(V – n*b)/n
Once again the world could rest securely having been given a new equation, which approximates reality better.
But obviously this equation is not enough; indeed it was not enough in the time of Van der Waals, when people working on “real” problems had their own more specific equations.
But the point, humans gather data and construct equations for the behavior of gases exist, the less error prone they have to be, the more niche and hard to construct (read: require more data and a more sophisticated mathematical apparatus) they become.
But let’s assume that Van der Waals (or any other equation maker) does all the data collection which he hopes will have a good coverage of standard behavior and edge cases… and feeds it into a neural network (or any other entity which is a good universal function estimator), and gets a function W.
This function W is as good, if not better than the original human-brain-made equation at predicting the behavior of gasses and fitting the collected data.
The way this function would work, assuming it was build using a neural network, would be something like:
W(V,T,n,gas) => P
Or maybe another function:
W(P,V,n,gas) => T
We got rid of R, a and b and replaced them with a simple categorical variable, gas (the name of the gas which is being modeled). The network, having each set of observations labeled with the respective gas they came from, learns how that gas behaves, and build its own equivalent of the magic number R, b and a into it’s weights and biases.
For one, this would wreak havoc in our education system, people could no longer be expected to know the Van der Waals equation and solve it on a blackboard for a given a, b and R. If we did this for all the “laws” of physics people might even start questioning why they have to learn this silly “math” contraption to being with, since all they are doing is feeding variables into a computer.
But on more steel-man-ish note, one would argue that this approach would stifle the progress of science. Since, after all, one can look at the Van der Waals equation and infer two things:
a) How you could come up with a better equation.
b) What data one would need to gather for that equation to be relevant, based on the places where the Van der Waals equation seems to produce results which “look odd” or conflict with experimental data.
Humans are great platforms for interacting with the physical reality around them; we’ve had billions of years to optimize us to do things such as grabbing, pouring, slashing, smashing and moving in an incomprehensibly large range of environments. I would never dare argue with this point and I think people building humanoid robots are basically up against an almost impossible task (though I am often amazed by their achievements).
What I would argue is that humans might not be that good at doing math. If there was any evolutionary pressure to do math, it came very recently into our development. People that can intuitively do movements which a bleeding-edge robot couldn’t dream of, fail at the most basic of mathematical tasks which the tiny computer inside my fridge would find laughably easy.
Yet for some reason people are adamant about constructing mathematical models to explain the data they observe in nature.
But presently there’s almost no examples of a task that requires numbers as inputs, and has numbers as outputs, where dozens of years of human brain work can match machine learning models.
If we look at something like protein folding: the hot-topic nowadays is not if machine learning models can beat carefully crafted algorithms based on “domain specific” human logic; it is how little data they need and how fast they can do it.
Based on my readings, there’s ample proof that the answer to both of those is “very little” (e.g. https://www.nature.com/articles/d41586-019-01357-6).
The main issue in most fields, however, is that head-to-head comparison of human-made models vs machine learning models is impossible. Since the way science progresses is not by carefully collection and STORING data, it’s by collecting data, creating human-made models that explain the data, calling the almost-perfect models fact and then forgetting about the actual data.
I can’t stress how important this point is, think back to any course in physics, chemistry or biology you took and at what was discussed. Where you ever shown the data based on which we construct the picture of the underlying reality ? Or where you mainly focusing on equations and assuming those equations are equivalent to looking at the raw data.
This leads me into the above points a) and b). Namely the fact that human-made models are better because other humans can look at them and figure out where they are wrong and what data to collect in order to come up with better models.
I will concede that this is often true, but it’s true because we have a flawed epistemic perspective. In order for a machine learning model to be better than a string-theorist, it would have to understand string theory and then make some “discovery” (read: a mathematical model) that agrees with experiments previously unexplained by string theory.
However this mindset in itself is lacking, because the human models are flawed and the data that the models were built on is often not easily accessible, if at all.
Even when the underlying data is accessible, it’s usually accessible having passed through several layer of flawed human abstraction.
The result of the experiments are interpreted and stored using high level human made abstractions.
For example, we could store the results of an experiment in two different ways:
Something like variant 2 is probably better if a physicist looking at the data, but maybe a “lower level” interpretation such as 1 would be better suited for an automated data-interpretation algorithm.
That’s not to say abstraction 2 is inherently worse than abstraction 1, indeed, if you go down the deconstructionist path and break down the abstractions used to understand the world as far down as you can… you end up nowhere, you end up in a philosophical void arguing with Kant and Hume about whether or not we can truly state or know anything.
I agree that any abstraction that can be designed will be flawed in some sense, but the flaws we design into our abstractions, are specifically designed by humans for humans. I think that it’s very plausible to say that if we instead designed the abstractions with automated algorithms in mind as their users (or let said algorithms design them) we might end up with some pretty different abstractions.
If all you have to go by are flawed models, rather than raw data, then a human might well do better at creating slightly less flawed models, not because humans are better model creators, but because humans understand the “context” in which those models where created.
In the few cases where easy to work with data is available to feed into a model, such as is the case with predicting folding from peptide structure, models often do better. But even in those cases, I’d argue that we disadvantage the models because they can only work with the data we provide them; they can’t ask us to provide different data, they can’t design their own abstractions and ask to receive data interpreted using those abstractions.
Our process of data gathering in itself is flawed from a ML-driven perspective, because we don’t design our models in such a way as to understand the type of data they would need to make better predictions.
I think this flaw stems from two directions.
One, the concept of ML models making inferences about the type of data they need is heavily underdeveloped. In the last year, part of my job has been tackling this very problem of models that analyze models/themselves, and I will admit I was rather surprised at the lack of research on the topic even at a fundamental level (e.g. making neural networks give a certainty for a prediction or determine the importance of a feature). The closest one can get to serious researchers tackling this problem is reinforcement learning, but it’s not clear whether those approaches would lend themselves well to the real world, where the cost of exploration is very high… it’s much cheaper for a model to play millions of DOTA2 games to get better at DOTA2, than for a model to dictate millions of hadron collider experiments to get better at physics.
Two, we are reluctant to change our approach to science to a more data-driven one. I think the idea of machine learning models dictating to scientists what kind of data they should collect is far-fetched, but I think part of why it is far-fetched is because we intuitively know that no scientist will want to be reduced to a simple data collector.
This is a cat and mouse argument, “no models that are good at figuring out what data they need are made because no humans want to do the data-gathering” and “no humans want to do model-driven data gathering because no models exist that are good at figuring out what data they need.”
An easier requirement would be to properly index and store all the data that leads to a human-made model and properly index and store all the data that leads to the models that lead to the models upon which the human-made model is designed, and so on. But even this is too much to ask, because often the old data is lost, it’s not digitized and it’s compressed in a lossy way. The experimental results might be recorded somewhere, but details about how the experiment was run are lost, or were never even recorded, because they were too complex for a human-made model to take into account.
However, what would be easy to do is play in good faith with machine learning models and do our best to at least store and properly index all data from current experiments and observations.
But maybe I’m being unfair to physics here; after all I have a poor understanding of modern physics, to my knowledge experimental physicists are actually doing their best in the modern world to store the data from their experiments and storing all the data from physics experiments is actually a complicated task; since the inception of CERN it has produced literally Exabytes of data.
Instead, let me harp on a field that’s much easier to criticize: medical research.
If you go into topics like cancer detection, I sometime feel like what’s currently going on is basically a charade in order to justify still using human-based assessments rather than algorithmic assessments.
There’s an abject lack of human vs algorithm-based detection trials and the few reviews that do take a comparative look at the issue. I will use this one: https://www.sciencedirect.com/science/article/pii/S2589750019301232 , as a reference and I encourage you to skim through it, I find it to be pretty good overall.
When averaging across studies, the pooled sensitivity was 88·6% (95% CI 85·7–90·9) for all deep learning algorithms and 79·4% (74·9–83·2) for all health-care professionals. The pooled specificity was 93·9% (92·2–95·3) for deep learning algorithms and 88·1% (82·8–91·9) for health-care professionals.
This is looking at an aggregate result over many conditions being diagnosed. As far as I understand the physicians and the models are not even always using the same dataset for testing.
Still, I think the results seem to be basically very promising, especially when you only consider the highest specificity & sensitivity ML algorithms, especially when you consider they would cost us a few cents and hour to run, whereas doctors cost us anywhere from 10 to 300$ an hour. Especially when you consider that this could be deployed in areas that have no access to specialized doctors to being with.
Only four studies provided health-care professionals with additional clinical information, as they would have in clinical practice; one study also tested the scenario in which prior or historical imaging was provided to the algorithm and the health-care professional, and four studies also considered diagnostic performance in an algorithm-plus-clinician scenario. It is worth noting that no studies reported a formal sample size calculation to ensure that the study was sufficiently sized in a head-to-head comparison.
I wonder why so few studies provided direct comparison with doctors, considering that most researcher I know of are delighted to test their models against some real-world benchmarks, I think it’s not unreasonable to speculate something along the lines of: “No group of oncologists/radiologists are heavily invested in running trials of themselves versus the best known machine learning models that would serve to replace their work."
Even more so, looking at review of various individual model models (e.g. https://www.sciencedirect.com/science/article/pii/S2001037014000464) you can very easily see that the amount of data they use is laughable, often as low as a few dozens of examples.
Consider the fact that 37 million mammograms are performed each year in the US. We’ve had at least 5 years since the idea that a CNN could detect cancer from medical images had strong footing. Assuming as little as 20% of those mammograms had 1-5 year follow-up, that’s bound to be a dataset of at least 37 million examples of images connected with a breast-cancer diagnosis. Indeed, if we expand this to all countries rich enough to produce sizable amounts of such data, this numbers should be in the hundreds of millions.
Yet the largest claimed dataset (from this paper: https://arxiv.org/pdf/1903.08297v1.pdf). Contains only 1.2 million images and is proprietary. The single model that was tried on this dataset claims to perform about as well as a radiologist. But imagine if we didn’t have only 1.2 million images, imagine if we had over 100 million and image if we didn’t have a single model, imagine if we had hundreds of them, if we didn’t have a single team but rather hundreds of teams working on this problem.
To me that seems like something that has so much potential it’s appalling that we aren’t doing it. After all the end result would save billions of taxpayer dollars in terms of salaries and followup procedures, prevent a sad end to countless life (often on relatively young people, since breast cancer is frequent in young demographics compared to other types of cancer) and help with early detection, possibly preventing mentally and physically scaring procedures such as mastectomies and long-term aggressive chemotherapy.
So, again, my question is:
I’m not claiming there’s some evil organization of scientists or doctors actively plotting against machine learning models that can replace them.
Most of scientists and doctors are nice people that want the advancement of their fields to continue, even if that means they might have to work on less of the stuff they consider to be “fun”.
The problem is systemic: bad regulations are keeping the public from accessing useful data; bad allotment of funds and bad coordination means we don’t get the centralized data stores we should have. Bad practices mean that scientists and doctors are actively discouraged from taking a data-driven approach, where their main role is accurately producing, labeling and storing data… when was the last Nobel prize in medicine awarded to someone that put together a large and easy to use database for diagnosing DVT risk?
But I do think that there’s a very good reason to intuit that the task of building models of the world, especially mathematical models, might be better accomplished if we focused more on giving it to an ML model rather than a brain; that brilliant scientists dreaming up complex equations to explain experiments might often be a bad approach to creating ML models that learn to fit well-collected data. That doctors looking for anomalies in images their brain wasn’t design to perceive might be worst-off than a CNN classifier designed and trained for detecting that specific anomaly.
I also think that our inherent bias for man-created models rather than machine-created models is dictating a lot of the data gathering we do. In turn, his bias is dictating the kind of machine learning models we build, or rather making us uninterested in building more well-integrated models that concern themselves not only with making predictions about the data we’ve given them, but also about the kind of data we should be collecting to improve them.
The sooner we internalize the large possibility that we already have tools much better than our brains for creating a mathematical model from data, the sooner we’ll start seeing the value of data over models, and the sooner we’ll start seeing the value of properly wording a problem as opposed to just building a model that solves an ill-constructed problem.
Finally, I don’t necessarily have a solution for this, nor am I claiming that what I’m saying in this article is “obviously true”. I think I’ve taken the perspective of “machine learning is always better than humans at fitting equations on data” partially as a given; granted, I did give two examples of this seemingly being the case, but I don’t expect to significantly shift your perspective towards my view here.
What I’ve hopefully done is give a bit more context on the idea that humans might be “too focused” on building abstractions and equations, as opposed to building models that will build those abstractions and equations for us. I should have probably added more examples and more caveats, but I’m not hoping to dictate global scientific policy with this blog, I’m just hoping to offer interesting perspectives to ponder. I am not certain of this view, it’s one hypothesis playing around in my mind which I found interesting and did my best to share.
It sounds to me like you are arguing that humans can't improve machine learning models by studying them. That seems to be far from our reality. Most successful machine learning models we have today get tuned by humans for the task they are doing.
Why is not all the data for tackling this issue publicly available ?
Because people value privacy and don't want their medical data being public.
The single model that was tried on this dataset claims to perform about as well as a radiologist. But imagine if we didn’t have only 1.2 million images, imagine if we had over 100 million and image if we didn’t have a single model, imagine if we had hundreds of them,
If the trainings dataset happens to be about how the average radialogist rates an image you don't get better then a radiologist by looking at more images with the same labeling.
if we didn’t have a single team but rather hundreds of teams working on this problem.
If you look at Go as a casestudy, the progress when hundreds of teams were working at the problem was relatively slow. The progress took of when the single well-funded team at DeepMind spend it's resources on the problem.
A few well-funded teams will likely do a better job at analysing cancer images and produce a commerical product then academic teams.
Admin note: I fixed some things that looked like formatting errors (paragraph breaks after variable-names). Let me know if you intended the original version and I can revert it.
Seems alright to me, thanks for the help.