I'll be brief, omit needless words.
Intelligence is prediction is compression because
Compression is finding a code that makes the data shorter
And codeword lengths are probabilities
So codes are probability distributions
But probability distributions are prediction strategies.

New Comment
4 comments, sorted by Click to highlight new comments since:

And prediction strategies are almost optimization procedures?

Did your really need to say that you'd be brief? Wasn't it enough to say that you'd omit needless words? :)

But then he'd lose the Strunk and White allusion.

I approve the haikuesque format.

Do you agree that the "bijection" Intelligence -> Prediction preserves more structure than Prediction -> Compression?