Good point. Thank you for bringing this up. I just had a closer look in my notes at how the complexity penalty is derived and there is a additional assumption that I left out.

The derivation uses a matrix X with p+1 columns and n rows which has entry xj−1i in the ith row and jth column (where x1,x2,…,xn is the training set). In the derivation it is assumed that X has rank p+1 which true most of the time provided that n≥p+1. For simplicity I won't add a mention of this matrix to original post but I will add the assumption n≥p+1.

Good point. Thank you for bringing this up. I just had a closer look in my notes at how the complexity penalty is derived and there is a additional assumption that I left out.

The derivation uses a matrix X with p+1 columns and n rows which has entry xj−1i in the ith row and jth column (where x1,x2,…,xn is the training set). In the derivation it is assumed that X has rank p+1 which true most of the time provided that n≥p+1. For simplicity I won't add a mention of this matrix to original post but I will add the assumption n≥p+1.