We want to prove:
∫∫pxs(x,s)logpxs(x,s)ps(s)px(x)dxds+∫∫pys(y,s)logpys(y,s)ps(s)py(y)dyds≤∫∫∫pxys(x,y,s)logpxys(x,y,s)ps(s)pxy(x,y)dxdyds
This can be rewritten as:
∫∫∫pxys(x,y,s)logpsx(s,x)ps(s)px(x)dxdyds+∫∫∫pxys(x,y,s)logpsy(s,y)ps(s)py(y)dxdyds≤∫∫∫pxys(x,y,s)logpxys(x,y,s)ps(s)px(x)py(y)dxdyds
After moving everything to the right hand side and simplifying, we get:
∫∫∫pxys(x,y,s)logpxys(x,y,s)ps(s)pxs(x,s)pys(y,s)dxdyds≥0
Now if we just prove that q(x,y,s)=pxs(x,s)pys(y,s)ps(s) is a probability distribution, then the left hand side is KL(pxys(x,y,s),q) , and Kullback-Leibler divergence is always nonnegative.
Ok, q is obviously nonnegative, and its integral equals 1:
∫∫∫pxs(x,s)pys(y,s)ps(s)dxdyds=∫ps(s)ps(s)ps(s)ds=∫ps(s)ds=1
Q.e.d.
Just for amusement, I think this theorem can fail when s, x, y represent subsystems of an entangled quantum state. (The most natural generalization of mutual information to this domain is sometimes negative.)