MINE: Free tool for detecting novel associations in large data sets

curiousepic

LESSWRONG
LW

MINE: Free tool for detecting novel associations in large data sets — LessWrong

7 MINE: Free tool for detecting novel associations in large data sets

by curiousepic

17th Dec 2011

1 min read

7

I was waiting for someone more knowledgeable to post something about this, but it's been a couple days and thought I'd bring it to LW's attention.

The maximal information coefficient (MIC) is a measure of two-variable dependence developed with the guidelines of generality and equitability in mind. The published paper describing MIC shows that it comes very close to achieving both goals simultaneously, and that it significantly outperforms competing methods in this regard.

Video summary

Main site with access to tool

Personal Blog

7

MINE: Free tool for detecting novel associations in large data sets

New Comment

5 comments, sorted by

top scoring

Click to highlight new comments since: Today at 10:05 AM

[-]snarles14y30

I'm skeptical about the general usefulness of the tool, but I feel like the best way to evaluate such suspicions to to find a fairly well-understood dataset and compare the utility of traditional measures of correlations vs MIC in finding true relationships in the data.

[-]DanielVarga14y20

The most important link is the Supplementary Material. I only found it through the reddit thread. (Not much else to go there for. Maybe the pirated paper itself, but that is basically just an extended abstract of the SOM.) Here is the link to the SOM:

http://www.sciencemag.org/content/suppl/2011/12/14/334.6062.1518.DC1/Reshef.SOM.pdf

Figures S5 and S6 (page 41) make me conjecture that compared to LOESS, this new method is an improvement only when the relationship is not a function (but a many-valued function). Not that I am really familiar with LOESS.

[-]BruceyB14y00

I'm far from an expert on LOESS (in fact, I hadn't heard the term before now), but it looks like it doesn't perform a comparable function to MIC. LOESS seems to be an algorithm for producing a non-linear regression while MIC is an algorithm to measure the strength of a relationship between two variables.

In the paper (figure 2A), they compare it to Pearson correlation coefficient, Spearman rank correlation, mutual information, CorGC, and maximal correlation on data in a variety of shapes. Basically, it is effective on a wider range of shapes than any of them.

[-]DanielVarga14y00

Check out figures S5.D and S6 from the SOM. If the relationship is functional (the linear, parabolic, sinusoidal cases on Figure S6), then the R2 calculated from LOESS regression is quite close to this MIC score, and that's not a coincidence. Of course LOESS R2 just dies when it encounters a non-functional relationship.

[-]Dr_Manhattan14y10

extensive discussion on reddit: http://www.reddit.com/r/science/comments/neoz6/scientists_create_new_algorithm_to_automatically/

Moderation Log