Testing which LLM architectures can do hidden serial reasoning — LessWrong