Could we have predicted emergent misalignment a priori using unsupervised behaviour elicitation? — LessWrong