I'll be brief, omit needless words.
Intelligence is prediction is compression because
Compression is finding a code that makes the data shorter
And codeword lengths are probabilities
So codes are probability distributions
But probability distributions are prediction strategies.
And prediction strategies are almost optimization procedures?
Did your really need to say that you'd be brief? Wasn't it enough to say that you'd omit needless words? :)
But then he'd lose the Strunk and White allusion.
I approve the haikuesque format.
Do you agree that the "bijection" Intelligence -> Prediction preserves more structure than Prediction -> Compression?