Decomposing the QK circuit with Bilinear Sparse Dictionary Learning — LessWrong