Interpretable Fine Tuning Research Update and Working Prototype — LessWrong