http://arstechnica.com/science/2014/03/ibm-to-set-watson-loose-on-cancer-genome-data/

Can anyone more informed about Watson or Machine Learning in general comment on this application?  Specifically, I'm interested in an explanation of this part:

[...] the National Institutes of Health has compiled lists of biochemical pathways—signaling networks and protein interactions—and placed them in machine-readable formats. Once those were imported, Watson's text analysis abilities were set loose on the NIH's PubMed database, which contains abstracts of nearly every paper published in peer-reviewed biomedical journals.

Over time, Watson will develop its own sense of what sources it looks at are consistently reliable. Royyuru told Ars that, if the team decides to, it can start adding the full text of articles and branch out to other information sources. Between the known pathways and the scientific literature, however, IBM seems to think that Watson has a good grip on what typically goes on inside cells.

It sounds like Watson will be trained through some standard formatted input data and then it's going to read plaintext articles and draw conclusions about them?  It sounds like they're anticipating that Watson will be able to tell which studies are "good" studies as well, which sounds incredible (in both senses of the word).

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 3:51 AM

I think by "good" studies, they mean "the findings align with the rest of what we see, and the prognostics informed by the paper are accurate" rather than "we read the paper carefully and the investigators appear to have done the right experiment correctly."

"Good" means "confirmed by experiment." In terms of stats methodology, it's sort of a spaghetti against the wall situation.

That makes sense, but I'm still a bit confused about the specifics. So the idea is: Watson reads papers and develops a model of human biology, is asked what to do in a certain situation, gives a prescribed therapy, then is fed back the results when they try it on someone so that he can decide if the specific studies he used to make the prediction are true in practice?

Watson reads papers and develops a model of human biology

I suspect the model is already mostly developed; I believe that's what the machine-readable biochemical pathways is. We then read the papers to develop a model of how the genes interact with the biochemical pathways.

My impression of how this will work in practice is that we take samples from Harold, and Harold's tumor Bob, and read their DNA. Harold's DNA is probably mostly normal, but Bob's DNA is probably totally crazy, in a fairly unique way. Watson can then query its graph with Bob's DNA to say "this is how I think Bob works biochemically, and using that knowledge I predict we can kill Bob if we attack this particular pathway."

(Then we should be able to learn from how the treatment goes, and so on.)