I have some philosophical objections to your approach. I'm not sure it's such a good idea to focus exclusively on research questions that are explicitly aging-related, just because you'll be limiting yourself to a subset of the promising ideas out there. Secondly, you probably shouldn't worry about pursuing a project in which your already-collected data is useless, especially if that data or similar is also available to most other researchers in your field (if not, it would be very useful for you to try to make that data available to others who could do so... (read more)

Also, I'm not sure if this is your intention, but it seems to me that the goal of spending 20 years to slow or prevent aging is a recipe for wasting time. It's such an ambitious goal that so many people are already working on, any one researcher is unlikely to put a measurable dent in it.

In the last five years the NIH (National Institutes of Health) has never spent more than 2% of its budget on aging research. To a first approximation, the availability of grant support is proportional to the number of academic researchers, or at least to the amount of ... (read more)

0bokov5yThis is 'new' data in the sense that it is only now becoming available for research purposes, and if I have my way, it is going to be in a very flexible and analysis-friendly format. It is the core mission of my team to make the data available to researchers (insofar as permitted by law, patients' right to privacy, and contractual obligations to the owners of the data). If I ran "academia", tool and method development would take at least as much priority as traditional hypothesis-driven research. I think a major take-home message of LW is that hypotheses are a dime a dozen-- what we need are practical ways to rank them and update their rankings on new data. A good tool that lets you crank through thousands of hypotheses is worth a lot more than any individual hypothesis. I have all kinds of fun ideas for tools. But for the purposes of this post, I'm assuming that I'm stuck with the academia we have, I have access to a large anonymized clinical dataset, and I want to make the best possible use of it (I'll address your points about aging as a choice of topic in a separate reply). The academia we're stuck with (at least in the biomedical field) effectively requires faculty to have a research plan describable by "Determine whether FOO is true or false" rather than "Create a FOO that does BAR". So the nobrainer approach would be for me to take the tool I most want to develop, slap some age-related disease onto it as a motivating use-case, and make that my grant. But, this optimizes for the wrong thing-- I don't want to find excuses for engaging in fascinating intellectual exercises. I want to find the problems with the greatest potential to advance human longevity, and then bring my assets to bear on those problems even if the work turns out to be uglier and more tedious than my ideal informatics project. The reason I'm asking for the LW community's perspective on what's on the critical path to human longevity is that I spent too much time around excuse-driven^H^H^H

Request for suggestions: ageing and data-mining

by bokov 1 min read24th Nov 201448 comments


Imagine you had the following at your disposal:

  • A Ph.D. in a biological science, with a fair amount of reading and wet-lab work under your belt on the topic of aging and longevity (but in hindsight, nothing that turned out to leverage any real mechanistic insights into aging).
  • A M.S. in statistics. Sadly, the non-Bayesian kind for the most part, but along the way acquired the meta-skills necessary to read and understand most quantitative papers with life-science applications.
  • Love of programming and data, the ability to learn most new computer languages in a couple of weeks, and at least 8 years spent hacking R code.
  • Research access to large amounts of anonymized patient data.
  • Optimistically, two decades remaining in which to make it all count.

Imagine that your goal were to slow or prevent biological aging...

  1. What would be the specific questions you would try to tackle first?
  2. What additional skills would you add to your toolkit?
  3. How would you allocate your limited time between the research questions in #1 and the acquisition of new skills in #2?

Thanks for your input.


I thank everyone for their input and apologize for how long it has taken me to post an update.

I met with Aubrey de Grey and he recommended using the anonymized patient data to look for novel uses for already-prescribed drugs. He also suggested I do a comparison of existing longitudinal studies (e.g. Framingham) and the equivalent data elements from our data warehouse. I asked him that if he runs into any researchers with promising theories or methods but for a massive human dataset to test them on, to send them my way.

My original question was a bit to broad in retrospect: I should have focused more on how to best leverage the capabilities my project already has in place rather than a more general "what should I do with myself" kind of appeal. On the other hand, at the time I might have been less confident about the project's success than I am now. Though the conversation immediately went off into prospective experiments rather than analyzing existing data, there were some great ideas there that may yet become practical to implement.

At any rate, a lot of this has been overcome by events. In the last six months I realized that before we even get to the bifurcation point between longevity and other research areas, there are a crapload of technical, logistical, and organizational problems to solve. I no longer have any doubt that these real problems are worth solving, my team is well positioned to solve many of them, and the solutions will significantly accelerate research in many areas including longevity. We have institutional support, we have a credible revenue stream, and no shortage of promising directions to pursue. The limiting factor now is people-hours. So, we are recruiting.

Thanks again to everyone for their feedback.