Newbie questions about information theory and transformers — LessWrong