Great post. I’m a clinical-translational lymphoma researcher and all of the issues you describe are a critical issue in moving our field forward.
I share your optimism that ML will be able to help us find features of cancer that humans would never be able to discover due to the sheer amount of data. In the past few years we have developed ML methods to decipher different subcategories of diffuse large B cell lymphoma (DLBCL, the most common lymphoma) using genomic and multi-omic strategies. We now have several competing systems of categorizing by both the lymphomas themselves and the way they interact with the other cells that play a critical role in their survival and outcomes of treatment (the microenvironment).
However, these are a) extremely expensive to do on any individual patient, b) time consuming, and c) are yet to be clinically actionable. Identifying smaller and smaller subgroups of lymphomas (we went from having basically “Hodgkin” and “non-Hodgkin lymphoma” only to having dozens of subclassifications of DLBCL in a span of only a few decades) is critical for prognostication- but we haven’t yet been able to actually say “drug X will work better than drug Y in Z subtype of DLBCL with micro environment signature A”. Critically, the more you subclassify the harder it is to actually get sufficient numbers for clinical trials.
I am hopeful that ML may help us to find those clues we haven’t found yet, and find actionable solutions with smaller ns for trials. As you say, cancer has a surprising amount of detail- but we need help incorporating all of the exponentially increasing classifiers.
I agree that it would be nice to understand the mechanisms, but I actually think that is secondary if we have a tool that can helping patients now and we can understand the mechanisms later. If I feed H&E slides into a black box AI agent and the output it spits out inproves my patient’s survival, it helps them right now, today. Yes, I think understanding mechanisms underlying cancer biology is important (I have literally dedicated my life to it), but that can come after. A lot of cancer drug development has gone this way. One good example: thalidomide and its next generation drugs have been used effectively for years in lymphoma and myeloma but we only recently understood the mechanisms. Patients can derive benefit without understanding how it works.
I actually recently got into an argument with a cardiologist colleague after a talk where someone showed that an AI trained on millions of EKGs could predict who had atrial fibrillation (an arrhythmia), even if they weren’t having the arrhythmia at the time of the EKG. My colleague got frustrated because the criteria it found didn’t make “physiologic sense”- they couldn’t understand why those specific EKG findings meant afib. But to me, if it can help people now, we can try to understand the underlying mechanisms later.
My point is that it always feels good and right to understand why something works, and that should always be the goal. But I don’t think we should deny patients the chance at improved outcomes while we wait to understand why.